When working with XML files, which I usually process with Powershell scripts, sometimes I stumble on the following error message:
Select-Xml : The file 'C:\data\lalala.xml' cannot be read: The input document has exceeded a limit set by MaxCharactersInDocument.
The problem here is that Powershell has a hard-coded limit to stop you from reading huge files. This limit is 536870912 bytes (0.5 Gb).
So what can you do if you need to process such a large file? C# to the rescue! You can add inline C# code that uses XmlTextReader, which is very fast.
The specific problem I had to solve is to get all the IDs from several XML files and create SQL out of them. Then I used this SQL to create a temp table in my SQL server, like this:
SELECT t.* INTO #documentids FROM (
SELECT 'e29e16f9-f77e-4dce-92e0-99fef71c868d' AS id UNION ALL
SELECT '0cd0fef5-0037-492e-861f-593f7322c2a3' AS id UNION ALL
SELECT 'e963a4da-2a60-43f1-b4d4-9b07b9d07d57' AS id UNION ALL
-- <all the stuff created by powershell>
SELECT 'ca70d7b7-40c8-418d-8612-42fe2fd5bf28' AS id
) t
But of course you can easily customize the code to meet your needs.
I tested with an 1.76 Gb XML on my dev laptop. It took 19.6 seconds. Not bad 🙂
Here’s the code I used:
#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, January 2020
#
Clear-Host
$ErrorActionPreference='Stop'
$referencingassemblies = ("System.XML.dll")
Add-Type -ReferencedAssemblies $referencingassemblies -Language CSharp @"
using System;
using System.IO;
using System.Xml;
using System.Collections.Generic;
namespace DotJim.Powershell
{
public static class Xml
{
public static List<string> GetAttributes(string fileName, string localName, string template)
{
var list = new List<string>();
XmlTextReader txtreaderObj = new XmlTextReader(fileName);
txtreaderObj.WhitespaceHandling = WhitespaceHandling.None;
while (txtreaderObj.Read())
{
if (txtreaderObj.LocalName.Equals(localName) &&
txtreaderObj.IsStartElement())
{
txtreaderObj.Read();
//Console.WriteLine(txtreaderObj.Value);
string s = txtreaderObj.Value;
string line = string.Format(template, s);
list.Add(line);
}
}
return list;
}
}
}
"@;
# Here I'm trying to create a big SELECT sql statement out of all the IDs in the xml
$template = "SELECT '{0}' AS id UNION ALL"
$fileList = @("C:\data\printdata1.xml", "C:\data\printdata1.xml")
$StartDate=(Get-Date)
foreach($fileName in $fileList)
{
$list = [DotJim.Powershell.Xml]::GetAttributes($fileName, "DocumentId", $template)
$list[$list.Count-1] = $list[$list.Count-1].Replace("UNION ALL", "")
Out-File -FilePath $fileName.Replace(".xml", ".sql") -InputObject $list
}
$EndDate=(Get-Date)
$timeDiff = New-TimeSpan –Start $StartDate –End $EndDate
Write-Host "Finished processing $($fileList.Count) files in $($timeDiff.TotalSeconds) seconds"
Have fun coding!
Nice one.