I have been allocated in a legacy project at my job for the past week. The problem is that the code has a lot of artifacts from its origins in a pre-versioning era: obvious or (really) unnecessary comments, whitespace, new lines, etc. This increases the cognitive load required to understand the code and makes it incredible hard to read.
So I've talked to my team about it and we started an initiative to remove those stuff from the code, making it cleaner, more maintainable and readable.
For safety, it was decided that we're only going to remove stuff that don't interfere in the code execution at all - just unnecessary/commented code, excessive new lines and unreachable statements. To make my job easier, I've made 2 simple regexes to help find those occurrences, analyses them, and replace it if necessary - since some classes have more than 800 lines of code.
(\/{2,}.+)|(\/{2,})
This will find and highlight any comments starting with "//". It doesn't matter if is a empty line or a comment in the end of a valid code line.
/*[^*](.|\n)*?*/
This will find and highlight any commented code blocks who are not PHP Docs (usually, those starts with /** instead of /*).
How am I using this regexes?
My IDE of choice is JetBrains' PHP Storm. I'm using their replacement tool to find the occurrences (using the regexes), analyzing if it's ok to remove, and replacing it with a "\n" (new line) character. Every robust text editor or IDE has an equivalent tool, and you can use it just like me.
The results
I'm splitting those changes in multiple pull requests. This helps with the code review process, making it more manageable to my coworkers. By the time I'm writing this article, I've done 5 PRs, with a average of 20 files by PR.
It should be deployed soon, and it's going to help a lot with tasks within this project. Just looking at the code, the difference is day and night.