This blog will contain a host of informations about various vulnerabilities and thoughts related to vulnerability management.
2025-02-15
To view older blog posts, please visit the archives section.
I originally set out to write about CVE-2025-1094, a PostgreSQL vulnerability that allows SQL injection even through a properly sanitized string. How could this not catch my attention?
As per my usual, I went to the project code repository, found the patch and started analysing it. From there, I got a surface level understanding about what the vulnerability is and quickly realized I would need to dig deeper into Unicode encoding schemes if I were to really understand the vulnerability and be able to use it in demonstrations or other similar activities.
I got lazy.
So I went to look for a proof of concept (PoC) for the vulnerability that would help me get a fast track understanding of the problematic inputs through PoC analysis.
My fun got cut short.
Stephen Fewer, principal security researcher at Rapid7, the guy who discovered the vulnerability, had already put together an amazing write-up (targeted at another vulnerability chained with this one) for everyone to learn from. No point for me to spend the time doing a full analysis… What a bummer!
However…
I still find this vulnerability rather interesting from a philosophical standpoint…
The vulnerability exists because strings sent to the sanitizer function containing invalid multibyte characters (invalid UTF-8) can, sometimes, result in valid SQL strings allowing unescaped quotes to “sneak in” through the character itself because of how the invalid UTF-8 character is handled.
For example, if the (invalid) UTF-8 character 0xC027 were to be sent to the sanitization function, the character would be copied as is because it’s not a quote or any other dangerous SQL characters. However, characters starting with 0xC0 are currently not mapped to anything under UTF-8. Upon interpretation, in this case, instead of resulting in an error, the individual bytes would be interpreted. Resulting in the ascii character 0x27, or in other words, the single quote character, being "decoded".
If you want to dig deeper into why 0xC0 is an issue here, I strongly recommend you start by looking into the byte-map section of the Wikipedia page for UTF-8.
The following Python snippet demonstrates the type of errors that should occur when parsing invalid UTF-8 characters. It also highlights how interpreting the individual bytes separately can lead to unexpected behavior.
>>> # Print decode with valid non break space UTF char
>>> t = b'hello\xc2\xa0you'
>>> print(t.decode())
hello you
>>>
>>>
>>> # Print decode with invalid UTF char
>>> t = b'hello\xc0\x27you'
>>> print(t.decode())
Traceback (most recent call last):
File "<python-input-24>", line 1, in <module>
print(t.decode())
~~~~~~~~^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 5: invalid start byte
>>>
>>>
>>> # Character mapping of individual bytes from invalid UTF char to configured locale
>>> chr(0xc0)
'À'
>>> chr(0x27)
"'"
>>>
>>> # Interpretation is the key to soooooo many security issues...
The very existence of vulnerabilities like CVE-2025-1094 lies on a couple of different factors like:
A developer sees an “unused” section in the UTF-8 byte map and unknowingly assumes it’s safe to future-proof their code;
A developer and reviewer fail to fully grasp the standard’s edge cases, leaving room for misinterpretation;
RFC 3629 explicitly warns about the security risks of improperly handling invalid characters, yet these warnings are often overlooked;
Speed of development – A tired developer is always more likely to introduce subtle issues into their code;
(I am not stating that this is the case here, I am only using this CVE as an example) Possible tampering of the supply chain by people who are aware of the consequences of small, and hard to detect, errors like this one.
As a developer, how often have you implemented something without being fully aware of a given standard or the presence of undefined behaviour? You’re probably not exactly sure of what your answer to this question is… Let’s simplify the question. As a developer, how often have you relied on various sources, like StackOverflow for example, to help you out with a tricky piece of code without doing a full review of the documentation related to the functions, method, or anything else used by the code example?
If I was to take a bet, I would say it probably happened a couple of times.
This is where code reviews are important. Reviewing a piece of code, or a pull request for example, should not only be about the plain obvious “does it follow our project coding standards” but also about different factors surrounding the implementation details such as:
Was the RFC followed or not?
Were the functions/methods used properly and according to the documentation?
Am I actually sure, 100% without the shadow of a doubt, what function or method X does?
I have been reviewing code for a large part of my professional life. I still, from time to time and to this very day, go back and review the documentation of very basic functions that I’m either using or seeing in code under review. In a great proportion of issues, these could be prevented by a simple documentation review either at the time of implementation or pull request review.
In the case of CVE-2025-1094, mishandling of a single byte, could be responsible for multiple ongoing or past breaches. Indeed, when looking closer at the code history, a very large part of the vulnerable code was introduced to PostgreSQL in a commit made 9 years ago.
This vulnerability could have been lurking for years.
Even though writing code gets easier every day, it remains a complex process.
With the rise of AI assisted programming and the ever present need to deliver and adapt to new technology trends, developers have never been under as much pressure. Code reviews have also never been as important as it is now.
In these days of fast pace development, one saying has never been as true: Slow is fast.
If you're in a leadership position, how you’re managing developer productivity and performance is key to long term success of your development endeavour. Just remember one key thing, in security and software development, slow and deliberate beats fast and reckless every time.