Tag Archives: regex

Powershell: scan a file with regex and write the output

Let’s say you have a log file. There’s some info in there, like URLs, that you need them in a list.

Copy-pasting? Hell no, Powershell to the rescue!

PowerShell
#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, February 2025
#
# Change the regex to fit your purposes
# and of course the input file
$regEx = 'https?://[^\s/$.?#].[^\s]*'
$inputFile = "C:\logs\mybiglog.txt"
$outputFile = [System.IO.Path]::Combine([System.IO.Path]::GetDirectoryName($inputFile), "out_$([guid]::NewGuid().ToString().Split('-')[0]).txt")
$content = Get-Content -Path $inputFile -Raw
$matches = [regex]::Matches($content, $regEx)
$matches | ForEach-Object { $_.Value } | Out-File -FilePath $outputFile

This is the easy way. And it works… unless the log file is big, meaning, more than a few GB. In this case, trying to fit the whole file in memory (which Get-Content does) is going to blow up your system.

So, what do you do? You stream. No, not like Netflix. Well, kind of:

PowerShell
#
# Source: DotJim blog (http://dandraka.com)
# Jim Andrakakis, February 2025
#
# Change the regex to fit your purposes
# and of course the input file
$inputFile = "C:\logs\mybiglog.txt"
$outputFile = "C:\logs\out_$([guid]::NewGuid().ToString().Split('-')[0]).txt"
$regEx = [regex]'https?://[^\s/$.?#].[^\s]*'
# Create a stream reader for the input and a writer for the output
$reader = [System.IO.File]::OpenText($inputFile)
$writer = [System.IO.StreamWriter]::new($outputFile)
try {
while ($line = $reader.ReadLine()) {
$matches = $regEx.Matches($line)
foreach ($match in $matches) {
$writer.WriteLine($match.Value)
}
}
}
finally {
# Always close your streams to release the file locks
$reader.Close()
$writer.Dispose()
$reader.Dispose()
}
Write-Host "Processing complete. Results saved to: $outputFile"

Have fun coding!