And now for something completely different:
This weekend, my mail server was slammed by a spammer using a rogue account to create hundreds of thousands of spam emails that jammed my outbound mail queue. Mixed with the spam were valuable customer emails, so I had to sort through all the mail ASAP and delete anything that wasn’t legit.
First I tried a simple loop that loaded each file and deleted it if it contained a bad string. But that was taking a while, so I made my filter multithreaded.
First, I load a list of files to process:
string[] files = Directory.GetFiles(directory);
Console.WriteLine(files.Length + " files.");
|
string[] files = Directory.GetFiles(directory); Console.WriteLine(files.Length + " files.");
(You can iterate through the files instead, but I wanted to see how many files there are.)
I instantiate the class with the BackgroundWorker:
DeleteProcess DeleteProcess = new DeleteProcess();
|
DeleteProcess DeleteProcess = new DeleteProcess();
Now, I loop through the files, checking each for spam:
foreach (string mFile in files)
{
if (CheckBlacklist(mFile))
{
DeleteProcess.filesToDelete.Add(mFile);
if (!DeleteProcess.worker.IsBusy)
DeleteProcess.worker.RunWorkerAsync();
}
}
|
foreach (string mFile in files) { if (CheckBlacklist(mFile)) { DeleteProcess.filesToDelete.Add(mFile); if (!DeleteProcess.worker.IsBusy) DeleteProcess.worker.RunWorkerAsync(); } }
Instead of loading the whole file, I just read it until I determine that it is spam. Since 99% of messages were spam, this went pretty quickly:
private static bool CheckBlacklist(string mFile)
{
using (StreamReader reader = new StreamReader(new FileStream(mFile, FileMode.Open, FileAccess.Read)))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Contains("NIGERIA") || line.Contains("Message Delivery Delay"))
return true;
}
}
return false;
}
|
private static bool CheckBlacklist(string mFile) { using (StreamReader reader = new StreamReader(new FileStream(mFile, FileMode.Open, FileAccess.Read))) { string line; while ((line = reader.ReadLine()) != null) { if (line.Contains("NIGERIA") || line.Contains("Message Delivery Delay")) return true; } } return false; }
(By using FileAccess.Read, I speed things up a bit.)
Now for the delete thread. Here is how it’s wired up:
public List filesToDelete = new List();
public BackgroundWorker worker = new BackgroundWorker { WorkerReportsProgress = true, WorkerSupportsCancellation = true };
public DeleteProcess()
{
worker.DoWork += worker_DoWork;
worker.ProgressChanged += worker_ProgressChanged;
worker.RunWorkerCompleted += worker_RunWorkerCompleted;
}
|
public List filesToDelete = new List(); public BackgroundWorker worker = new BackgroundWorker { WorkerReportsProgress = true, WorkerSupportsCancellation = true }; public DeleteProcess() { worker.DoWork += worker_DoWork; worker.ProgressChanged += worker_ProgressChanged; worker.RunWorkerCompleted += worker_RunWorkerCompleted; }
The worker thread should get the first file name from the queue, delete the file, and then delete the filename list item:
private void worker_DoWork(object sender, DoWorkEventArgs e)
{
while (filesToDelete.Count > 0)
{
worker.ReportProgress(0, filesToDelete[0].Replace(Program.directory, string.Empty));
File.Delete(filesToDelete[0]);
File.Delete(filesToDelete[0].Replace(@"OutgoingMessages", @"Outgoing"));
filesToDelete.RemoveAt(0);
}
}
|
private void worker_DoWork(object sender, DoWorkEventArgs e) { while (filesToDelete.Count > 0) { worker.ReportProgress(0, filesToDelete[0].Replace(Program.directory, string.Empty)); File.Delete(filesToDelete[0]); File.Delete(filesToDelete[0].Replace(@"OutgoingMessages", @"Outgoing")); filesToDelete.RemoveAt(0); } }
When we’re done, we count the remaining files:
Console.WriteLine(Directory.GetFiles(Program.directory).Length + " files left.");
|
Console.WriteLine(Directory.GetFiles(Program.directory).Length + " files left.");
It’s possible to create a collection of BackgroundWorkers if you want to utilize multiple CPU’s, but the bottleneck in this case was the disk IO, so it wouldn’t help.