Tag Archives: c#

New version of Zoro: 2.0

I just published a new version of my open source C# Zoro library in Github and Nuget.org.

Zoro is a data masking/anonymization utility. It fetches data from a database or a CSV file, masks (i.e. anonymizes) them according to the configuration provided and uses the masked data to create a CSV file or run SQL statements such as INSERTs or UPDATEs.

The new version, 2.0, has been converted to DotNet Standard 2.1 to take advantage of some useful DotNet features. The command line utility and the test project are written with DotNet Core 5.0.

The issue from 1.0.2, where the Nuget package did not contain the executables, has been corrected. The package now contains both a Win64 and a Linux64 executable. Since they are self-contained programs, no prior installation of DotNet is needed.

But the most important new feature is a new MaskType, “Query”. With this, the library can retrieve values from a database and pick a random one. In previous versions this was only possible with lists that were fixed in the XML (MaskType=List).

For example, let’s say you are masking the following data:

IDNameCityCountry
1ABBBadenCH
2FAGEAthensGR
3IKEADelftNL
Table “customers”

In the database you might also have a table with cities and countries:

CityNameCountryCode
ZĂĽrichCH
GenevaCH
BernCH
RethimnoGR
ChaniaGR
KalamataGR
GoudaNL
GeldropNL
Table “cities”

In order to anonymize the above data, your config could look like this:

<?xml version="1.0"?>
<MaskConfig xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <FieldMasks>
    <FieldMask>
      <FieldName>ID</FieldName>
      <MaskType>None</MaskType>
    </FieldMask>  
    <FieldMask>
      <FieldName>Name</FieldName>
      <MaskType>Similar</MaskType>
    </FieldMask>      
    <FieldMask>
      <FieldName>Country</FieldName>
      <MaskType>None</MaskType>
    </FieldMask>  
    <FieldMask>
      <FieldName>City</FieldName>
      <MaskType>Query</MaskType>
      <QueryReplacement
      	SelectorField="Country" 
      	GroupField="Countrycode" 
      	ValueField="Cityname" 
      	Query="SELECT cityname, countrycode FROM cities" />
    </FieldMask>  
  </FieldMasks>
  <DataSource>Database</DataSource>
  <DataDestination>Database</DataDestination>
  <ConnectionString>
Server=DBSRV1;Database=appdb;Trusted_Connection=yes;
  </ConnectionString>
  <ConnectionType>
System.Data.SqlClient
  </ConnectionType>
  <SqlSelect>
SELECT * FROM customers
  </SqlSelect>
  <SqlCommand>
INSERT INTO customers_anonymous
(ID, Name, City, Country)
VALUES
($ID, $Name, $City, $Country)
  </SqlCommand>
</MaskConfig>

This will result in a table looking like this:

IDNameCityCountry
1EGTBernCH
2SOLEChaniaGR
3UFOEGeldropNL
Table “customers_anonymous”

If you have any questions, please write in the comments.

Enjoy!

Please don’t write logs inside Program Files (here’s how to do it right)

So the other day I’m troubleshooting a Windows Service that keeps failing on a server, part of a product we’re using in the company. Long story short, that’s what the problem was:

Access to the path 'C:\Program Files\whatever\whatever.log is denied'

I mean, dear programmer, look. You want to write your application’s logs as simple text files. I get it. Text files are simple, reliable (if the file system doesn’t work, you have bigger problems than logging) and they’re shown in virtually every coding tutorial in every programming language. Depending on the case, there might be better ways to do that such as syslog, eventlog and others.

But sure, let’s go with text files. Take the following example somewhere in the middle of a Python tutorial. Look at line 3:

import logging

logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s')
logging.warning('This will get logged to a file')

Did you notice? This code writes the log in the same place as the binary. It’s not explicitly mentioned and usually you wouldn’t give it a second thought, right?

To be clear, I don’t want to be hard on the writers of this or any other tutorial; it’s just a basic tutorial, and as such it should highlight the core concept. A professional developer writing an enterprise product should know a bit more!

But the thing is, these examples are everywhere. Take another Java tutorial and look at line 16:

package com.javacodegeeks.snippets.core;

import java.util.logging.Logger;
import java.util.logging.FileHandler;
import java.util.logging.SimpleFormatter;
import java.io.IOException;

public class SequencedLogFile {

    public static final int FILE_SIZE = 1024;
    public static void main(String[] args) {

        Logger logger = Logger.getLogger(SequencedLogFile.class.getName());
        try {
            // Create an instance of FileHandler with 5 logging files sequences.
            FileHandler handler = new FileHandler("sample.log", FILE_SIZE, 5, true);
            handler.setFormatter(new SimpleFormatter());
            logger.addHandler(handler);
            logger.setUseParentHandlers(false);
        } catch (IOException e) {
            logger.warning("Failed to initialize logger handler.");
        }
        logger.info("Logging info message.");
        logger.warning("Logging warn message.");
    }
}

Or this Dot Net tutorial, which explains how to set up Log4Net (which is great!) and gives this configuration example. Let’s see if you can spot this one. Which line is the problem?

<log4net>
  <root>
    <level value="ALL" />
    <appender-ref ref="LogFileAppender" />
  </root>
  <appender name="LogFileAppender" type="log4net.Appender.RollingFileAppender">
    <file value="proper.log" />
    <lockingModel type="log4net.Appender.FileAppender+MinimalLock" />
    <appendToFile value="true" />
    <rollingStyle value="Size" />
    <maxSizeRollBackups value="2" />
    <maximumFileSize value="1MB" />
    <staticLogFileName value="true" />
    <layout type="log4net.Layout.PatternLayout">
      <conversionPattern value="%d [%t] %-5p %c %m%n" />
    </layout>
  </appender>
</log4net>

If you answered “7”, congrats, you’re starting to get it. Not using a path -this should be obvious, I know, but it’s easy to forget nevertheless- means writing in the current path, which by default is wherever the binary is.

So this works fine while you’re developing. It works fine when you do your unit tests. It probably works when your testers do the user acceptance testing or whatever QA process you have.

But when your customers install the software, the exe usually goes to C:\Program Files (that’s in Windows; in Linux there are different possibilities as explained here, but let’s say /usr/bin). Normal users do not have permission to write there; an administrator can grant this, but they really really really shouldn’t. You’re not supposed to tamper with the executables! Unless you’re doing some maintenance or an upgrade of course.

So how do you do this correctly?

First of all, it’s a good idea to not reinvent the wheel. There are many, many, MANY libraries to choose from, some of them very mature, like log4net for Dot Net or log4j for Java.

But if you want to keep it very simple, fine. There are basically two ways to do it.

If it’s a UI-based software, that your users will use interactively:

Create a directory under %localappdata% (by default C:\Users\SOMEUSER\AppData\Local) with the brand name of your company and/or product, and write in there.

You can get the localappdata path using the following line in Dot Net:

string localAppDataPath = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData);

Take for example the screen-capturing software called Greenshot. These guys do it right:

If it’s a non-interactive software, like a Windows Service:

You can do the same as above, but instead of Environment.SpecialFolder.LocalApplicationData use Environment.SpecialFolder.CommonApplicationData, which by default is C:\ProgramData. So your logs will be in C:\ProgramData\MyAmazingCompany\myamazingproduct.log.

Or -not recommended, but not as horrible as writing in Program Files- you can create something custom like C:\MyAmazingCompany\logs. I’ll be honest with you, it’s ugly, but it works.

But in any case, be careful to consider your environment. Is your software supposed to run on Windows, Linux, Mac, everything? A good place to start is here, for Dot Net, but the concept is the same in every language.

And, also important, make your logging configurable! The location should be changeable via a config file. Different systems have different requirements. Someone will need the logs somewhere special for their own reasons.

But whatever you do, PLEASE PLEASE PLEASE DON’T WRITE WHERE THE BINARY IS. DON’T WRITE IN C:\PROGRAM FILES. IT. DOES. NOT. WORK.

New version of Zoro: 1.0.2

I just published a new version of my open source C# Zoro(*) library in Github and Nuget.org.

Zoro is a data masking/anonymization utility. It fetches data from a database or a CSV file, and creates a CSV file with masked data.

The new version, 1.0.2, has been converted to DotNet Standard 2.0. The command line utility and the test project have been converted to Dotnet Core 5.0.

There is a known issue, not with the code but with the Nuget package. The description claims, as was intended, that the package contains not only the library but also the exe, which can be used as a standalone command line utility. But due to some wrong path in the Github Action, it doesn’t.

I’ll try to get that fixed in the next weeks. Until then, if you need the exe, please checkout the code and build with Visual Studio or Visual Studio Code.

(*) YES NOW I KNOW THIS IS MISSPELLED AND THE CORRECT SPELLING IS ZORRO, I DIDN’T WHEN I STARTED THE LIBRARY, SORRY!

New version of XMLSlurper: 1.3.0

I just published a new version of my open source C# XmlSlurper library in Github and Nuget.org.

The new version, 1.3.0, contains two major bug fixes:

  1. In previous versions, when the xml contained CDATA nodes, an error was thrown (“Type System.Xml.XmlCDataSection is not supported”). This has been fixed, so now the following works:
<CustomAttributes>
    <Title><![CDATA[DOCUMENTO N. 1234-9876]]></Title>
</CustomAttributes>

This xml can be used as follows:

var cdata = XmlSlurper.ParseText(getFile("CData.xml"));
Console.WriteLine(cdata.Title);
// produces 'DOCUMENTO N. 1234-9876'
  1. In previous versions, when the xml contained xml comments, an error was thrown (“Type System.Xml.XmlComment is not supported”). This has been fixed; the xml comments are now ignored.

Separately, there are a few more changes that don’t impact the users of the library:

  1. A Github action was added that, when the package version changes, automatically builds and tests the project, creates a Nuget package and publishes it to Nuget.org. That will save me quite some time come next version 🙂
  2. The test project was migrated from DotNet Core 2.2 to 3.1.
  3. The tests were migrated from MSTest to xUnit, to make the project able to be developed both in Windows and in Linux -my personal laptop runs Ubuntu.

The new version is backwards compatible with all previous versions. So if you use it, updating your projects is effortless and strongly recommended.

Powershell: how to overcome MaxCharactersInDocument limit when reading XML

When working with XML files, which I usually process with Powershell scripts, sometimes I stumble on the following error message:

Select-Xml : The file 'C:\data\lalala.xml' cannot be read: The input document has exceeded a limit set by MaxCharactersInDocument.

The problem here is that Powershell has a hard-coded limit to stop you from reading huge files. This limit is 536870912 bytes (0.5 Gb).

So what can you do if you need to process such a large file? C# to the rescue! You can add inline C# code that uses XmlTextReader, which is very fast.

The specific problem I had to solve is to get all the IDs from several XML files and create SQL out of them. Then I used this SQL to create a temp table in my SQL server, like this:

SELECT t.* INTO #documentids FROM ( 
  SELECT 'e29e16f9-f77e-4dce-92e0-99fef71c868d' AS id UNION ALL
  SELECT '0cd0fef5-0037-492e-861f-593f7322c2a3' AS id UNION ALL
  SELECT 'e963a4da-2a60-43f1-b4d4-9b07b9d07d57' AS id UNION ALL
  -- <all the stuff created by powershell> 
  SELECT 'ca70d7b7-40c8-418d-8612-42fe2fd5bf28' AS id 
) t

But of course you can easily customize the code to meet your needs.

I tested with an 1.76 Gb XML on my dev laptop. It took 19.6 seconds. Not bad 🙂

Here’s the code I used:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, January 2020
#
Clear-Host
$ErrorActionPreference='Stop'

$referencingassemblies = ("System.XML.dll")
Add-Type -ReferencedAssemblies $referencingassemblies -Language CSharp @"
using System;
using System.IO;
using System.Xml;
using System.Collections.Generic;
namespace DotJim.Powershell 
{
    public static class Xml
    {
        public static List<string> GetAttributes(string fileName, string localName, string template)
        {
            var list = new List<string>();
            XmlTextReader txtreaderObj = new XmlTextReader(fileName);
            txtreaderObj.WhitespaceHandling = WhitespaceHandling.None;
            while (txtreaderObj.Read())
            {
                if (txtreaderObj.LocalName.Equals(localName) &&
                txtreaderObj.IsStartElement())
                {
                     txtreaderObj.Read();
                     //Console.WriteLine(txtreaderObj.Value);
                     string s = txtreaderObj.Value;
                     string line = string.Format(template, s);
                     list.Add(line);
                }
            } 
            return list;
        }
    }
}
"@;

# Here I'm trying to create a big SELECT sql statement out of all the IDs in the xml
$template = "SELECT '{0}' AS id UNION ALL"
$fileList = @("C:\data\printdata1.xml", "C:\data\printdata1.xml")

$StartDate=(Get-Date)

foreach($fileName in $fileList)
{
    $list = [DotJim.Powershell.Xml]::GetAttributes($fileName, "DocumentId", $template)
    $list[$list.Count-1] = $list[$list.Count-1].Replace("UNION ALL", "")
    Out-File -FilePath $fileName.Replace(".xml", ".sql") -InputObject $list
}

$EndDate=(Get-Date)

$timeDiff = New-TimeSpan –Start $StartDate –End $EndDate
Write-Host "Finished processing $($fileList.Count) files in $($timeDiff.TotalSeconds) seconds"

Have fun coding!

Lefkogeia – a REST API test server

Lefkogeia is a server to test your REST APIs with. When it runs, it accepts every request made on the configured IP/port and returns an HTTP 200 “Thank you for your {method name, GET/POST etc}”. As you can imagine, I developed it for my own needs and then thought it would be handy for others, so I published it on Github.

You can download its first release here. The project’s intro page is here.

It logs all requests in a directory imaginatively called logs. It creates an access.txt file where all requests are written, and one file per request (000001.txt, 000002.txt etc) in which the request’s payload (e.g. an xml or a json) is written.

Usage

The primary use of Lefkogeia is to test/debug/troubleshoot REST API and web services clients. You run it (see release notes on that) and get your client to call it. It will log whatever was sent, allowing you to troubleshoot whatever problem you might have.

Configuration

To configure Lefkogeia, edit the appsettings.json file with a text editor.

An example appsettings.json file to serve multiple addresses & ports would be:

{
	"Logging": {
		"LogLevel": {
			"Default": "Debug",
			"System": "Information",
			"Microsoft": "Information"
		}
	},
	"Host": {
		"Url": [
			"http://localhost:6800",
			"http://server1:7777",
			"http://147.102.43.3:4545"
		]
	}
}

Paths are not yet supported in URLs, so if you change http://server1:7777 to http://server1:7777/testapi you will get an error. This is planned for the next release.

Also note that in order to use https:// you need to generate a certificate by running

dotnet dev-certs https --trust

For more info see https://go.microsoft.com/fwlink/?linkid=848054.

But why “Lefkogeia”, what does this even mean?

Because it’s such a beautiful place! Lefkogeia is a small village in southern Crete, Greece, with amazing beaches like Ammoudi, Shinaria, Klisidi and more. You can read more in Tripadvisor.

Weird regional settings problems

If you’ve ever had to share files with data between different countries, you know that this can be problematic. For example, in Greece and the Netherlands the number “one thousand three hundred comma five” is written as “1 dot 300 comma 5”, in the UK it’s written as “1 comma 000 dot 5”, in Switzerland as “1 apostrophe 000 comma 5” etc etc. Same goes for dates.

So if you write software that is meant to be used in different countries, you have to be very careful and test thoroughly. And even then, you can run into problems. Just today I managed to solved a very weird one: Dutch-formatted numbers in an Excel file with Swiss settings caused an error message which, on the face of it, had nothing to do with formatting.

Y’know, 9/11 is the ninth of November in Greece

But the strangest, incomprehensible, 100% bang-your-head-on-the-wall problem I had was around 2005. My team wrote software that was meant to be multi-cultural and was used in Greece, Cyprus, Malta, Portugal, Turkey, Brasil and China (I may have missed a country or two after all these years).

So at some point me and my manager had to fly to Cyprus to test the software on-site; we went to a few of our customers and tried it out. And we were getting very, very, very strange error messages when doing simple, tried-and-true stuff. For a while we were flabbergasted.

After tearing my hair out and troubleshooting like crazy for hours on end, I noticed something which, while unusual, at first sight had nothing to do with our problems: our customers in Cyprus had set their Windows regional settings to use a dot as the thousand separator (according to the Greek settings) and… a dot (again) as the decimal separator (according to the UK settings).

Having tried virtually everything I changed it, just for the hell of it. I think I tried the normal Greek settings at first. And, like magic, everything was fixed! No errors whatsoever, everything ran smoothly!

You can imagine my astonishment.

I also tried a different setting (UK) and it was fine. I switched it back to the “special” Cyprus setting, and, sure enough, the problem started again. Now that I knew what to look for, I discovered that our software was “confused” (threw an error) when trying to understand just what kind of number 1 dot 234 dot 05 is.

Powershell: How do you add inline C#?

Powershell is great for admin tasks. Stuff like iterating through files and folders, copying and transforming files are very, very easily done. But inevitably there will always be stuff that are easier to do via a “normal” language such as C#.

Trying to solve a problem I had at work, I needed to transform a CSV file by changing the fields -which is easily done via powershell- and, at the same time, do a “get only the highest record of every group”. This is done with LINQ, which you can use in powershell but it’s cumbersome and will result in many, many lines of code.

So I wanted to do this in a more clean way, in C#. The general template to include C# inside a powershell script is the following:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, November 2018
#
# Here goes the C# code:
Add-Type -Language CSharp @"
using System; 
namespace DotJim.Powershell 
{
    public static class Magician 
    {
        private static string spell = ""; 
        public static void DoMagic(string magicSpell) 
        {
            spell = magicSpell; 
        }
        public static string GetMagicSpells() 
        {
            return "Wingardium Leviosa\r\n" + spell; 
        }
    }
}
"@;

# And here's how to call it:
[DotJim.Powershell.Magician]::DoMagic("Expelliarmus")
$spell = [DotJim.Powershell.Magician]::GetMagicSpells()

Write-Host $spell

Note here that the C# classes don’t have to be static; but if they are, they’re easier to call (no instantiation needed). Of course this only works if all you need to do is provide an input and get a manipulated output. If you need more complex stuff then yes, you can use non-static classes or whatever C# functionality solves your problems. Here’s the previous example, but with a non-static class:

#
# Source: DotJim blog (https://dandraka.com)
# Jim Andrakakis, November 2018
#
# Here goes the C# code:
Add-Type -Language CSharp @"
using System; 
namespace DotJim.Powershell 
{
    public class Magician 
    {
        private string spell = ""; 
        public void DoMagic(string magicSpell) 
        {
            spell = magicSpell; 
        }
        public string GetMagicSpells() 
        {
            return "Wingardium Leviosa\r\n" + spell; 
        }
    }
}
"@;

# Here's how to create an instance:
$houdini = New-Object -TypeName DotJim.Powershell.Magician
# And here's how to call it:
$houdini.DoMagic("Expelliarmus")
$spell = $houdini.GetMagicSpells()

Write-Host $spell

The main advantage of having C# inside the powershell script (and not in a separate dll file) is that it can be deployed very easily with various Devops tools. Otherwise you need to deploy the dll alongside which can, sometimes, be the source of trouble.

So here’s my complete working code, which worked quite nicely:

#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, November 2018
#
# The purpose of this script is to read a CSV file with bank data
# and transform it into a different CSV.
#
# 1. The Bank class is a POCO to hold the data which I need
#    from every line of the CSV file.
# 2. The Add() method of the BankAggregator class adds the
#    record to the list after checking the data for correctness.
# 3. The Get() methof of the BankAggregator class does a
#    LINQ query to get the 1st (max BankNr) bank record
#    from every record with the same Country/BIC.
#    It then returns a list of strings, formatted the way
#    I want for the new (transformed) CSV file.
#
# Here is where I inline the C# code:
Add-Type -Language CSharp @"
using System;
using System.Collections.Generic;
using System.Linq;
namespace DotJim.Powershell {
 public class Bank {
  public int BankNr;
  public string Country;
  public string BIC;
 }
 public static class BankAggregator {
  private static List list = new List();
  public static void Add(string country, string bic, string bankNr) {
   //For debugging
   //Console.WriteLine(string.Format("{0}{3}{1}{3}{3}{2}", country, bic, bankNr, ";"));
   int mBankNr;
   // Check data for correctness, discard if not ok
   if (string.IsNullOrWhiteSpace(country) ||
    country.Length != 2 ||
    string.IsNullOrWhiteSpace(bic) ||
    string.IsNullOrWhiteSpace(bankNr) ||
    !int.TryParse(bankNr, out mBankNr) ||
    mBankNr & gt; = 0) {
    return;
   }
   list.Add(new Bank() {
    BankNr = mBankNr, Country = country, BIC = bic
   });
  }
  public static List Get(string delimiter) {
   // For every record with the same Country & BIC, keep only
   // the record with the highest BankNr
   var bankList = from b in list
   group b by new {
    b.Country, b.BIC
   }
   into bankGrp
   let maxBankNr = bankGrp.Max(x = & gt; x.BankNr)
   select new Bank {
    Country = bankGrp.Key.Country,
     BIC = bankGrp.Key.BIC,
     BankNr = maxBankNr
   };
   // Format the list the way I want the new CSV file to look
   return bankList.Select(x = & amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; amp; gt; string.Format("{0}{3}{1}{3}{3}{2}",
    x.Country, x.BIC, x.BankNr, delimiter)).ToList();
  }
 }
}
"@;

# Read one or more files with bank data from the same dir
# where the script is located ($PSScriptRoot)
$srcSearchStr = "source_bankdata*.csv"
$SourcePath = $PSScriptRoot
$destPath = $SourcePath

$fields = @("Country","BIC","EmptyField","BankId")

$filesList = Get-ChildItem -Path $SourcePath -Filter $srcSearchStr

foreach ($file in $filesList)
{
Write-Host "Processing" $file.FullName

# Fields in the source CSV:
# BANKNUMMER  = BankNr
# BANKLAND    = Country
# BANKSWIFT   = BIC
$data = Import-Csv -Path $file.FullName -Delimiter ";"

foreach ($item in $data)
{
# Call the C# code to add the CSV lines to the list
[DotJim.Powershell.BankAggregator]::Add($item.BANKLAND,$item.BANKSWIFT,$item.BANKNUMMER)
}

# Call the C# code to get the transformed data
$list = [DotJim.Powershell.BankAggregator]::Get(";")

Write-Host "Found" $list.Count "valid rows"

# Now that we have the list, write it in the new CSV
Out-File -FilePath "$destPath\transformed_bankdata_$(New-Guid).csv" -Encoding UTF8 -InputObject $list
}

Have fun coding!

Oh the joy of coding with German Office

So I’m maintaining an Access database, with lots of VBA code, which serves as an internal management tool.

Changing a report should be, and usually is, pretty straight forward. But there I am today, after all changes are done, stuck trying to understand WHY THE FREAKING F*** a format string doesn’t work.

The offender is a text box with a date, and I’m trying to get it to be displayed like Nov-2015. So, according to Microsoft’s documentation, the format string is mmm-yyyy. Needless to say, didn’t work.

After an hour or so of banging my head on the wall, enlightenment comes: Year in German is Jahr !!! So mmm-jjjj, and, pronto, it worked like a charm.

By the way, Microsoft’s german doc is wrong (seems like a copy-paste error): it mentions jj and yyyy, instead of jj and jjjj which work (giving 15 and 2015 respectively).

GRRRRRRRRRRRRRRRRRRR *#@$%