Multithreaded queue process with C# & BackgroundWorker

And now for something completely different:

This weekend, my mail server was slammed by a spammer using a rogue account to create hundreds of thousands of spam emails that jammed my outbound mail queue. Mixed with the spam were valuable customer emails, so I had to sort through all the mail ASAP and delete anything that wasn’t legit.

First I tried a simple loop that loaded each file and deleted it if it contained a bad string. But that was taking a while, so I made my filter multithreaded.

First, I load a list of files to process:

string[] files = Directory.GetFiles(directory);
Console.WriteLine(files.Length + " files.");

(You can iterate through the files instead, but I wanted to see how many files there are.)

I instantiate the class with the BackgroundWorker:

DeleteProcess DeleteProcess = new DeleteProcess();

Now, I loop through the files, checking each for spam:

            foreach (string mFile in files)
            {
                if (CheckBlacklist(mFile))
                {
                    DeleteProcess.filesToDelete.Add(mFile);
                    if (!DeleteProcess.worker.IsBusy)
                        DeleteProcess.worker.RunWorkerAsync();
                }
            }

Instead of loading the whole file, I just read it until I determine that it is spam. Since 99% of messages were spam, this went pretty quickly:

 private static bool CheckBlacklist(string mFile)
        {
            using (StreamReader reader = new StreamReader(new FileStream(mFile, FileMode.Open, FileAccess.Read)))
            {
                string line;
                while ((line = reader.ReadLine()) != null)
                {
                    if (line.Contains("NIGERIA") || line.Contains("Message Delivery Delay"))
                        return true;
                }
            }
            return false;
        }

(By using FileAccess.Read, I speed things up a bit.)

Now for the delete thread. Here is how it’s wired up:

public List filesToDelete = new List();
 
        public BackgroundWorker worker = new BackgroundWorker { WorkerReportsProgress = true, WorkerSupportsCancellation = true };
 
        public DeleteProcess()
        {
            worker.DoWork += worker_DoWork;
            worker.ProgressChanged += worker_ProgressChanged;
            worker.RunWorkerCompleted += worker_RunWorkerCompleted;
        }

The worker thread should get the first file name from the queue, delete the file, and then delete the filename list item:

 private void worker_DoWork(object sender, DoWorkEventArgs e)
        {
            while (filesToDelete.Count > 0)
            {
                worker.ReportProgress(0, filesToDelete[0].Replace(Program.directory, string.Empty));
                File.Delete(filesToDelete[0]);
                File.Delete(filesToDelete[0].Replace(@"OutgoingMessages", @"Outgoing"));
                filesToDelete.RemoveAt(0);
            }
        }

When we’re done, we count the remaining files:

Console.WriteLine(Directory.GetFiles(Program.directory).Length + " files left.");

It’s possible to create a collection of BackgroundWorkers if you want to utilize multiple CPU’s, but the bottleneck in this case was the disk IO, so it wouldn’t help.