All posts by Jim

Software engineer from Crete living in Switzerland; coffee addict; firearm lover; C# paladin; cryptography enthusiast; perpetually tormented by 3 beautiful women :-)

Running Groovy scripts in JAMS Scheduler

Here at work, we’re working on a migration project, from Jenkins (which we’ve been using as a scheduler) to JAMS Scheduler. In Jenkins we have a lot of Groovy scripts, and we have them in source control. So, to make the migration as effortless as possible, we wanted to use them “as-is”, right out of source control.

The solution I found was:

  1. On the JAMS agent, install the subversion command line client
  2. Also on the JAMS agent, install groovy
  3. Create a job that gets (“checks out”) the latest scripts every evening from source control in a specific directory; let’s call it c:\jobs
  4. Create a JAMS Execution Method called Groovy (see below)
  5. Create the Jenkins jobs in JAMS, one by one. In the source box, only write the full path of the groovy script, e.g. c:\jobs\TransferOrders.groovy

#4 is where the magic happens. The execution method is defined as a Powershell method. In the template, there’s code that (suprise) calls groovy. The powershell code is the following (see if you can spot a couple of tricks):

#
# Source: DotJim blog (https://dandraka.com)
# Jim Andrakakis, December 2018
#
Import-Module JAMS

# the job's source is supposed to contain ONLY 
# the full path to the groovy script, without quotes
$groovy = "C:\app\groovy-2.5.4\bin\groovy.bat"
$groovyScript="<<JAMS.Current.Source>>"

Write-Host "[JAMS-GROOVY] Running script $groovyScript via $groovy"
if ((Test-Path -Path $groovy) -ne $true)
{
	Write-Error "[JAMS-GROOVY] Groovy executable $groovy not found, is Groovy installed?"
}
if ((Test-Path -Path $groovyScript) -ne $true)
{
	Write-Error "[JAMS-GROOVY] Source file $groovyScript not found"
}

$currentJob = Get-JAMSEntry {JAMS.JAMSEntry} 
$currentJobParams = $currentJob.Parameters
$currentJobParamNames = $currentJobParams.Keys

foreach($n in $currentJobParamNames)
{
	[string]$v = $currentJobParams[$n].Value
	
	# look for replacement tokens
	# in the form of <<ParamName>>
	foreach($r in $currentJobParamNames)
	{
		if ($v.Contains("<<$r>>"))
        {
            [string]$replVal = $currentJobParams[$r].Value
            $v = $v.Replace("<<$r>>", $replVal)
        }
	}
	
	Write-Host "[JAMS-GROOVY] Setting parameter $n = $v"
	[Environment]::SetEnvironmentVariable($n, $v, "Process")
}

# execute the script in groovy
& $groovy $groovyScript

Write-Host "[JAMS-GROOVY] script finished"

Two tricks to note here:

  • Almost all our groovy scripts have parameters; Jenkins inserts the parameters as environment variables so the scripts can do:
myVar = System.getenv()['myVar']

The first powershell loop does exactly that; it maps all the job’s parameters, defined or inherited, as environment variables, so the scripts can continue to work happily, no change needed.

  • The second trick is actually an enhancement. As the scripts get promoted though our environments (development > test > integration test > production) some parts of the parameters change –but not all of them.

For example, let’s say there’s a parameter for an inputDirectory.
In the development server, it has the value c:\documents\dev\input. In test, it’s c:\documents\test\input, in integration test it’s c:\documents\intg\input and in production c:\documents\prod\input.

What we can do now is have a folder-level parameter, defined on the JAMS folder where our job definitions are –which is not transferred from
environment to environment. And we can have job-defined parameters that, using the familiar JAMS <<param>> notation, get their values substituted.

So, for example, let’s say I define a folder parameter named “SERVERLEVEL”, which will have the value of “dev” in development, “test” in test etc. In the job, I define another parameter called inputDirectory. This will have the value c:\documents\<<SERVERLEVEL>>\input.

Et voilà! Now we can promote the jobs from environment to environment, completely unchanged. In Jenkins we couldn’t do that; we had to define different values for parameters in dev, in test etc.

Here’s the export xml of the execution method:

<?xml version="1.0" encoding="utf-8"?>
<JAMSObjects>
  <method
    name="Groovy"
    type="Routine">
    <description><![CDATA[Run a pre-fetched groovy script. The job's source should contain the full path to the groovy script.

Note: in the "Bad regex pattern", the execution methon looks for "Caught:" to try to undertand whether 
groovy encountered an exception or not. Here's an example of the groovy output of a script where
an unhandled exception occured:

Hello, world!
Caught: java.lang.NullPointerException: Cannot invoke method test() on null object
java.lang.NullPointerException: Cannot invoke method test() on null object
        at test1.run(test1.groovy:4)]]></description>
    <template><![CDATA[Import-Module JAMS

# the job's source is supposed to contain ONLY 
# the full path to the groovy script, without quotes
$groovy = "C:\app\groovy-2.5.4\bin\groovy.bat"
$groovyScript="<<JAMS.Current.Source>>"

Write-Host "[JAMS-GROOVY] Running script $groovyScript via $groovy"
if ((Test-Path -Path $groovy) -ne $true)
{
	Write-Error "[JAMS-GROOVY] Groovy executable $groovy not found, is Groovy installed?"
}
if ((Test-Path -Path $groovyScript) -ne $true)
{
	Write-Error "[JAMS-GROOVY] Source file $groovyScript not found"
}

$currentJob = Get-JAMSEntry {JAMS.JAMSEntry} 
$currentJobParams = $currentJob.Parameters
$currentJobParamNames = $currentJobParams.Keys

foreach($n in $currentJobParamNames)
{
	[string]$v = $currentJobParams[$n].Value
	
	# look for replacement tokens
	# in the form of <<ParamName>>
	foreach($r in $currentJobParamNames)
	{
		if ($v.Contains("<<$r>>"))
        {
            [string]$replVal = $currentJobParams[$r].Value
            $v = $v.Replace("<<$r>>", $replVal)
        }
	}
	
	Write-Host "[JAMS-GROOVY] Setting parameter $n = $v"
	[Environment]::SetEnvironmentVariable($n, $v, "Process")
}

# execute the script in groovy
& $groovy $groovyScript

Write-Host "[JAMS-GROOVY] script finished"]]></template>
    <properties>
      <property
        name="HostAssemblyName"
        typename="System.String"
        value="JAMSPSHost" />
      <property
        name="HostClassName"
        typename="System.String"
        value="MVPSI.JAMS.Host.PowerShell.JAMSPSHost" />
      <property
        name="StartAssemblyName"
        typename="System.String"
        value="" />
      <property
        name="StartClassName"
        typename="System.String"
        value="" />
      <property
        name="EditAssemblyName"
        typename="System.String"
        value="" />
      <property
        name="EditClassName"
        typename="System.String"
        value="" />
      <property
        name="ViewAssemblyName"
        typename="System.String"
        value="" />
      <property
        name="ViewClassName"
        typename="System.String"
        value="" />
      <property
        name="BadPattern"
        typename="System.String"
        value="^Caught\:" />
      <property
        name="ExitCodeHandling"
        typename="MVPSI.JAMS.ExitCodeHandling"
        value="ZeroIsGood" />
      <property
        name="GoodPattern"
        typename="System.String"
        value="" />
      <property
        name="SpecificInformational"
        typename="System.String"
        value="" />
      <property
        name="SpecificValues"
        typename="System.String"
        value="" />
      <property
        name="SpecificWarning"
        typename="System.String"
        value="" />
      <property
        name="Force32Bit"
        typename="System.Boolean"
        value="false" />
      <property
        name="ForceV2"
        typename="System.Boolean"
        value="false" />
      <property
        name="HostLocally"
        typename="System.Boolean"
        value="false" />
      <property
        name="Interactive"
        typename="System.Boolean"
        value="false" />
      <property
        name="NoBOM"
        typename="System.Boolean"
        value="false" />
      <property
        name="SourceFormat"
        typename="MVPSI.JAMS.SourceFormat"
        value="Text" />
      <property
        name="EditAfterStart"
        typename="System.Boolean"
        value="false" />
      <property
        name="EditSource"
        typename="System.Boolean"
        value="false" />
      <property
        name="Extension"
        typename="System.String"
        value="ps1" />
      <property
        name="JobModule"
        typename="System.String"
        value="" />
      <property
        name="SnapshotSource"
        typename="System.Boolean"
        value="false" />
      <property
        name="Redirect"
        typename="MVPSI.JAMS.Redirect"
        value="All" />
      <property
        name="HostSubDirectory"
        typename="System.String"
        value="" />
      <property
        name="HostExecutable"
        typename="System.String"
        value="JAMSHost.exe" />
    </properties>
  </method>
</JAMSObjects>

Powershell: How do you add inline C#?

Powershell is great for admin tasks. Stuff like iterating through files and folders, copying and transforming files are very, very easily done. But inevitably there will always be stuff that are easier to do via a “normal” language such as C#.

Trying to solve a problem I had at work, I needed to transform a CSV file by changing the fields -which is easily done via powershell- and, at the same time, do a “get only the highest record of every group”. This is done with LINQ, which you can use in powershell but it’s cumbersome and will result in many, many lines of code.

So I wanted to do this in a more clean way, in C#. The general template to include C# inside a powershell script is the following:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, November 2018
#
# Here goes the C# code:
Add-Type -Language CSharp @"
using System; 
namespace DotJim.Powershell 
{
    public static class Magician 
    {
        private static string spell = ""; 
        public static void DoMagic(string magicSpell) 
        {
            spell = magicSpell; 
        }
        public static string GetMagicSpells() 
        {
            return "Wingardium Leviosa\r\n" + spell; 
        }
    }
}
"@;

# And here's how to call it:
[DotJim.Powershell.Magician]::DoMagic("Expelliarmus")
$spell = [DotJim.Powershell.Magician]::GetMagicSpells()

Write-Host $spell

Note here that the C# classes don’t have to be static; but if they are, they’re easier to call (no instantiation needed). Of course this only works if all you need to do is provide an input and get a manipulated output. If you need more complex stuff then yes, you can use non-static classes or whatever C# functionality solves your problems. Here’s the previous example, but with a non-static class:

#
# Source: DotJim blog (https://dandraka.com)
# Jim Andrakakis, November 2018
#
# Here goes the C# code:
Add-Type -Language CSharp @"
using System; 
namespace DotJim.Powershell 
{
    public class Magician 
    {
        private string spell = ""; 
        public void DoMagic(string magicSpell) 
        {
            spell = magicSpell; 
        }
        public string GetMagicSpells() 
        {
            return "Wingardium Leviosa\r\n" + spell; 
        }
    }
}
"@;

# Here's how to create an instance:
$houdini = New-Object -TypeName DotJim.Powershell.Magician
# And here's how to call it:
$houdini.DoMagic("Expelliarmus")
$spell = $houdini.GetMagicSpells()

Write-Host $spell

The main advantage of having C# inside the powershell script (and not in a separate dll file) is that it can be deployed very easily with various Devops tools. Otherwise you need to deploy the dll alongside which can, sometimes, be the source of trouble.

So here’s my complete working code, which worked quite nicely:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, November 2018
#
# The purpose of this script is to read a CSV file with bank data
# and transform it into a different CSV.
#
# 1. The Bank class is a POCO to hold the data which I need
#    from every line of the CSV file.
# 2. The Add() method of the BankAggregator class adds the
#    record to the list after checking the data for correctness.
# 3. The Get() methof of the BankAggregator class does a
#    LINQ query to get the 1st (max BankNr) bank record
#    from every record with the same Country/BIC.
#    It then returns a list of strings, formatted the way
#    I want for the new (transformed) CSV file.
#
# Here is where I inline the C# code:
Add-Type -Language CSharp @"
using System;
using System.Collections.Generic;
using System.Linq;
namespace DotJim.Powershell {
 public class Bank {
  public int BankNr;
  public string Country;
  public string BIC;
 }
 public static class BankAggregator {
  private static List list = new List();
  public static void Add(string country, string bic, string bankNr) {
   //For debugging
   //Console.WriteLine(string.Format("{0}{3}{1}{3}{3}{2}", country, bic, bankNr, ";"));
   int mBankNr;
   // Check data for correctness, discard if not ok
   if (string.IsNullOrWhiteSpace(country) ||
    country.Length != 2 ||
    string.IsNullOrWhiteSpace(bic) ||
    string.IsNullOrWhiteSpace(bankNr) ||
    !int.TryParse(bankNr, out mBankNr) ||
    mBankNr & gt; = 0) {
    return;
   }
   list.Add(new Bank() {
    BankNr = mBankNr, Country = country, BIC = bic
   });
  }
  public static List Get(string delimiter) {
   // For every record with the same Country & BIC, keep only
   // the record with the highest BankNr
   var bankList = from b in list
   group b by new {
    b.Country, b.BIC
   }
   into bankGrp
   let maxBankNr = bankGrp.Max(x = & gt; x.BankNr)
   select new Bank {
    Country = bankGrp.Key.Country,
     BIC = bankGrp.Key.BIC,
     BankNr = maxBankNr
   };
   // Format the list the way I want the new CSV file to look
   return bankList.Select(x = & amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; gt; string.Format("{0}{3}{1}{3}{3}{2}",
    x.Country, x.BIC, x.BankNr, delimiter)).ToList();
  }
 }
}
"@;

# Read one or more files with bank data from the same dir
# where the script is located ($PSScriptRoot)
$srcSearchStr = "source_bankdata*.csv"
$SourcePath = $PSScriptRoot
$destPath = $SourcePath

$fields = @("Country","BIC","EmptyField","BankId")

$filesList = Get-ChildItem -Path $SourcePath -Filter $srcSearchStr

foreach ($file in $filesList)
{
Write-Host "Processing" $file.FullName

# Fields in the source CSV:
# BANKNUMMER  = BankNr
# BANKLAND    = Country
# BANKSWIFT   = BIC
$data = Import-Csv -Path $file.FullName -Delimiter ";"

foreach ($item in $data)
{
# Call the C# code to add the CSV lines to the list
[DotJim.Powershell.BankAggregator]::Add($item.BANKLAND,$item.BANKSWIFT,$item.BANKNUMMER)
}

# Call the C# code to get the transformed data
$list = [DotJim.Powershell.BankAggregator]::Get(";")

Write-Host "Found" $list.Count "valid rows"

# Now that we have the list, write it in the new CSV
Out-File -FilePath "$destPath\transformed_bankdata_$(New-Guid).csv" -Encoding UTF8 -InputObject $list
}

Have fun coding!

My bread recipes

[UPDATE 03.2019] added a Brioche recipe.

I recently bought a bread machine, an Unold 8695 Onyx, and I’m very, very happy with it. Simple machine, nothing fancy (whenever I hear of appliances that are “connected”, “internet enabled” or, go forbid, “on the blockchain” I run away) but great value for money and gets the job done, very well.

The manual is excellent, with detailed timing tables and recipes which I fully recommend. That said, I did get the recipes that I liked most -the humble white bread and the farmer’s bread- and customized them a bit.

These are the ingredients, in the order which I put them in the bowl:

Brioche

Ingredient For 600 gr bread
White flour (Zopfmehl, type 405) 390 ml
Salt 3/4 teasp. (4 gr)
Sugar 2 tblsp. (40 gr)
Vanille sugar 1 pkg (8 gr)
Whole egg 1
Egg yolk 1
Yeast, fresh 1/2 cube
Milk 160 ml
Butter 80 gr

Important note: put everything in the bread maker bowl, in that order, except the milk and the butter. Then heat the milk and the butter just slightly (do not boil!) until the butter is almost melted. Then pour the milk-butter mix in the bowl over the other ingredients.

Use the Sweet (“Hefekuchen”) or Quick (“Schnell”) program, size 1 (“Stufe 1”) and light crust setting.

White bread

Ingredient For 500 gr bread For 800 gr bread
Water 230 ml 300 ml
Salt 3/4 teasp. (4 gr) 1 teasp. (6 gr)
Honey 2 tblsp. (40 gr) 2.5 tblsp. (52 gr)
Wheat semolina (or Corn polenta) 100 gr 126 gr
Whole wheat flour (Ruchmehl) or light whole wheat flour (Halbweissmehl) 20 gr 30 gr
White flour (Weissmehl, type 550, preferably with vitamins) 280 gr 356 gr
Yeast (if fresh yeast is used, use 1/2 a cube in both cases) 5 gr

7 gr (1 package)

Farmer’s bread

Ingredient For 800 gr bread
Water 320 ml
Leaven (Sauerteig; in CH, I can only find leaven powder in Coop) 10 gr (1 package)
Salt 1 teasp. (6 gr)
Butter or margarine 20 gr
Honey 2.5 tblsp. (52 gr)
Light whole wheat flour (Halbweissmehl) 400 gr
White flour (Weissmehl, type 550, preferably with vitamins) 100 gr
Yeast, fresh 1/2 cube

For both of them, I then use the “Quick” (“Schnell”) program, with light or medium crust. 1h 40min later, it’s ready.

Enjoy!

Citrix on Ubuntu 18.04

I recently changed from Win10 to Ubuntu 18.04 as my main OS at home. I still have Windows in a few VMs, as I need to do the occasional development with Visual Studio.

But a problem I had was that needed to connect to the office when doing home office.

Now, at work we have Citrix Netscaler Gateway. And there’s a Linux client available. It worked, but not as smoothly as I hoped 🙂

Here’s what I did:

From Ubuntu’s Software Center, I installed Citrix Receiver.

Then it asked for the server and tried to connect, but I was getting an error: “An SSL connection to the server could not be established because the server’s certificate could not be trusted.”

So I opened a terminal and gave the following commands (source):

sudo ln -s /usr/share/ca-certificates/mozilla/* /opt/Citrix/ICAClient/keystore/cacerts/

sudo c_rehash /opt/Citrix/ICAClient/keystore/cacerts/

After that it connected, but it was still giving an error: “A protocol error occured while communicating with the Authentication Service”

So after some sleuthing, I opened my browser (Chrome) and connected to the my company’s Citrix server address (https://server). When I clicked the apps there, it worked.

Powershell & Microsoft Dynamics CRM: how to get results using a FetchXml

If you’ve used Microsoft CRM as a power user (on-premise or online), chances are you’ve come across the standard way of querying CRM data, FetchXml.

You can run this by hand but of course the real power of it is using it to automate tasks. And another great way to automate tasks in Windows is, naturally, powershell.

So here’s a script I’m using to run a fetch xml and export the results to a csv file:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, May 2018
#
# ============ Constants to change ============
# note: create pwd file with the following command:
# read-host -assecurestring | convertfrom-securestring | out-file C:\temp\crmcred.pwd
$pwdFile = "C:\temp\crmcred.pwd"
$username = "myusername@mycompany.com"
$serverurl = "https://my-crm-instance.crm4.dynamics.com"
$fetchXmlFile = "c:\temp\fetch.xml"
$exportfile = "C:\temp\crm_export.csv"
$exportdelimiter = ";"
# =============================================
# ============ Login to MS CRM ============
$password = get-content $pwdFile | convertto-securestring
$cred = new-object -typename System.Management.Automation.PSCredential -argumentlist $username,$password
try
{
    # for on-prem use :
    # $connection = Connect-CrmOnPremDiscovery -Credential $cred -ServerUrl $serverurl
    $connection = Connect-CRMOnline -Credential $cred -ServerUrl $serverurl
    # you can also use interactive mode if you get e.g. problems with multi-factor authentication
    #$connection = Connect-CrmOnlineDiscovery -InteractiveMode -Credential $cred
}
catch
{
    Write-Host $_.Exception.Message
    exit
}
if($connection.IsReady -ne $True)
{
    $errorDescr = $connection.LastCrmError
    Write-Host "Connection not established: $errorDescr"
    exit
}
# ============ Fetch data ============
$fetchXml = [xml](Get-Content $fetchXmlFile)
$result = Get-CrmRecordsByFetch -conn $connection -Fetch $fetchXml.OuterXml
# ============ Write to file ============
# Obviously here, instead of writing to csv directly, you can loop and do whatever suits your needs, e.g. run a db query, call a web service etc etc
$result.CrmRecords | Select -Property lastname, firstname | Export-Csv -Encoding UTF8 -Path $exportfile -NoTypeInformation -Delimiter $exportdelimiter

When you use your own FetchXml, do remember to change the properties in the last line (lastname, firstname).

For a quick test, the example FetchXml I’m using is the following:

<fetch mapping="logical" version="1.0">
    <entity name="account">
        <attribute name="customertypecode" alias="customertypecode"/>
        <attribute name="name" alias="company_name"/>
        <attribute name="emailaddress1" alias="company_emailaddress1"/>
        <link-entity name="contact" from="accountid" to="accountid" link-type="inner">
            <attribute name="lastname" alias="lastname"/>
            <attribute name="firstname" alias="firstname"/>
        </link-entity>
    </entity>
</fetch>

Have fun coding!

Do execution plans change when using different filter values?

(short answer: yes!)

Anyone who develops software that interacts with a database knows (read: should know) how to read a query execution plan, given by “EXPLAIN PLAN”, and how to avoid at least the most common problems like a full table scan.

It is obvious that a plan can change if the database changes. For example if we add an index that is relevant to our query, it will be used to make our query faster. And this will be reflected in the new plan.

Likewise if the query changes. If instead of

SELECT * FROM mytable WHERE somevalue > 5

the query changes to

SELECT * FROM mytable WHERE somevalue IN 
  (SELECT someid FROM anothertable)

the plan will of course change.

So during a database performance tuning seminar at work, we came to the following question: can the execution plan change if we just change the filter value? Like, if instead of

SELECT * FROM mytable WHERE somevalue > 5

the query changes to

SELECT * FROM mytable WHERE somevalue > 10

It’s not obvious why it should. The columns used, both in the SELECT and the WHERE clause, do not change. So if a human would look at these two queries, they would select the same way of executing them (e.g. using an index on somevalue if one is available).

But databases have a knowledge we don’t have. They have statistics.

Let’s do an example. We’ll use Microsoft SQL server here. The edition doesn’t really matter, you can use Express for example. But the idea, and the results, are the same for Oracle or any other major RDBMS.

First off, let’s create a database. Open Management Studio and paste the following (changing the paths as needed):

CREATE DATABASE [PLANTEST]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'PLANTEST', 
FILENAME = N'C:\DATA\PLANTEST.mdf' , 
SIZE = 180MB , FILEGROWTH = 10% )
 LOG ON 
( NAME = N'PLANTEST_log', 
FILENAME = N'C:\DATA\PLANTEST_log.ldf' , 
SIZE = 20MB , FILEGROWTH = 10%)
GO

Note that, by default, I’ve allocated a lot of space, 180MB. There’s a reason for that; We know that we’ll pump in a lot of data, and we want to avoid the delay of the db files growing.

Now let’s create a table to work on:

USE PLANTEST
GO

CREATE TABLE dbo.TESTWORKLOAD
	(
	testid int NOT NULL IDENTITY(1,1),
	testname char(10) NULL,
	testdata nvarchar(36) NULL
	)  ON [PRIMARY]
GO

And let’s fill it (this can take some time, say around 5-10 minutes):

DECLARE @cnt1 INT = 0;
DECLARE @cnt2 INT = 0;

WHILE @cnt1 < 20
BEGIN
	SET @cnt2 = 0;
	WHILE @cnt2 < 100000
	BEGIN
	   insert into TESTWORKLOAD (testname, testdata) 
             values ('COMMON0001', CONVERT(char(36), NEWID()));
	   SET @cnt2 = @cnt2 + 1;
	END;
	insert into TESTWORKLOAD (testname, testdata) 
          values ('SPARSE0002', CONVERT(char(36), NEWID()));
	SET @cnt1 = @cnt1 + 1;
END;
GO

What I did here is, basically, I filled the table with 2 million (20 * 100000) plus 20 rows. Almost all of them (2 million) in the testname field, have the value “COMMON0001”. But a few, only 20, have a different value, “SPARSE0002”.

Essentially the table is our proverbial haystack. The “COMMON0001” rows are the hay, and the “SPARSE0002” rows are the needles 🙂

Let’s examine how the database will execute these two queries:

SELECT * FROM TESTWORKLOAD WHERE testname = 'COMMON0001';
SELECT * FROM TESTWORKLOAD WHERE testname = 'SPARSE0002';

Select both of them and, in management studio, press Control+L or the “Display estimated execution plan” button. What you will see is this:

What you see here is that both queries will do a full table scan. That means that the database will go and grab every single row from the table, look at the rows one by one, and give us only the ones who match (the ones with COMMON0001 or SPARSE0002, respectively).

That’s ok when you don’t have a lot of rows (say, up to 5 or 10 thousand), but it’s terribly slow when you have a lot (like our 2 million).

So let’s create an index for that:

CREATE NONCLUSTERED INDEX [IX_testname] ON [dbo].[TESTWORKLOAD]
(
	[testname] ASC
)
GO

And here’s where you watch the magic happen. Select the same queries as above and press Control+L (or the “Display estimated execution plan” button) again. Voila:

What you see here is that, even though the only difference between the two queries is the filter value, the execution plan changes.

Why does this happen? And how?

Well, here’s where statistics are handy. On the Object Explorer of management studio, expand (the “+”) our database and table, and then the “Statistics” folder.

You can see the statistic for our index, IX_testname. If you open it (double click and then go to “details”) you see the following:

So (I’m simplifying a bit here, but not a lot) the database knows how many rows have the value “COMMON0001” (2 million) and how many the value “SPARSE0002” (just 20).

Knowing this, it concludes (that’s the job of the query optimizer) that the best way to execute the 2 queries is different:

The first one (WHERE testname = ‘COMMON0001’) will return almost all the rows of the table. Knowing this, the optimizer decides that it’s faster to just get everything (aka Full Table Scan) and filter out the very few rows we don’t need.

For the second one (WHERE testname = ‘SPARSE0002’), things are different. The optimizer knows that it’s looking only for a few rows, and it’s smartly using the index to find them as fast as possible.

In plain English, if you want the hay out of a haystack, you just get the whole stack. But if you’re looking for the needles, you go find them one by one.