Lately, I’ve been developing some software which makes an intensive usage of QUrls as resource locators for local files. Nothing wrong here. QUrl is a powerful way of sharing the locations of those in an universal way. The problem is when you construct those QUrls from QStrings and you actually forget that QUrls are meant for much more than representing local file locations.
Authority chunks on an URL
At the moment of writing this, QUrl documentation, although quite complete, could be much more explanatory. For example, it says that the recommended way for creating a QUrl from a QString is:
* When creating an URL QString from a QByteArray or a char*,
always use QString::fromUtf8().
* Favor the use of QUrl::fromEncoded() and QUrl::toEncoded()
instead of QUrl(string) and QUrl::toString() when converting
QUrl to/from string.
But this is explained in the documentation for QUrl::fromUserInput(), instead of in the Detailed Description [ 1 ].
What is important from this explanation is that it is not a matter of favor the use of QUrl::from/toEncoded() over QUrl::(from)toString() but, I would say, a must if you don’t want to end up with bogus corner cases.
Why would this happen? Well, as I was saying, QUrl is meant for much more than universally representing the location of a file so, here go the big tips:
- If you want to get the QUrl from a local file represented with a QString, use always QUrl::fromLocalFile ( const QString & localFile ) . Don’t use QUrl::QUrl ( const QString & url ) if you don’t want to end up with some problems. In the same way, get the path to the local file always with QUrl::toLocalFile().
- If you want to get a QUrl from a QString representing an URL, be sure that the QString is actually representing a percent encoded URL, as it should to be a valid URL, and always use QUrl::fromEncoded ( const QByteArray & input, ParsingMode parsingMode ), with QUrl::StrictMode as the QUrl::ParsingMode.
- If you want to get a QString representation of an URL from a QUrl use always QUrl::toEncoded ().
Bogus examples for each case:
Local file
/mypath/my#file.jpg
Correct:
QUrl myUrl = QUrl::fromLocalFile("/mypath/my#file.jpg")
Incorrect:
QUrl myUrl = QUrl("file:///mypath/my#file.jpg")
The problem here is the way QUrl will treat the “#” character in the second example. It will think, as it actually doesn’t have a way of guessing, that the character is delimiting the fragment part of the URL.
Fragment part on an URL
As a result, calling to:
myUrl.toLocalFile()
in the first case will result to:
/mypath/my#file.jpg
while in the second will be:
/mypath/my
Parsing mode
/mypath/my#file.jpg
(encoded) url representation:
file:///mypath/my%23file.jpg
Correct:
QUrl myUrl = QUrl::fromEncoded("file:///mypath/my%23file.jpg", QUrl::StrictMode)
Incorrect:
QUrl myUrl = QUrl::fromEncoded("file:///mypath/my%23file.jpg")
The problem here is the way QUrl will treat the “%23” encoding in the second example. Although it is not explicitly explained in the documentation [ 2 ], QUrl will use QUrl::TolerantMode as ParsingMode by default. Therefore, it will think that the input comes from an ignorant user which was actually trying to pass “file:///mypath/my#file.jpg”. Again, it will understand after converting back “%23” to “#”, that the character is delimiting the fragment part of the URL.
As a result, calling to:
myUrl.toLocalFile()
in the first case will result to:
/mypath/my#file.jpg
while in the second will be:
/mypath/my
Encoded usage
/mypath/my#file.jpg
(encoded) url representation:
file:///mypath/my%23file.jpg
(unencoded and wrong) url representation:
file:///mypath/my#file.jpg
Correct:
QUrl myUrl = QUrl::fromEncoded("file:///mypath/my%23file.jpg", QUrl::StrictMode)
Incorrect:
QUrl myUrl = QUrl("file:///mypath/my#file.jpg")
Here, we have another incarnation of the very same problem than the two examples above. QUrl will think, again, as it actually doesn’t have a way of guessing, that the character is delimiting the fragment part of the URL.
As a result, calling to:
myUrl.toLocalFile()
in the first case will result to:
/mypath/my#file.jpg
while in the second will be:
/mypath/my
Corollary:
The default behavior of QUrl is to provide an easy handling of URLs to the user of our programs, the end user, but not the user of QUrl, the developers. I find this quite awkward but, still, it is a decision of Qt people and, as developers, we only have to take this into account when writing our code.
These bogus URLs, which are to be corrected with the usage of the QUrl::TolerantMode ParsingMode, usually come from a text entry box “à là” browser location bar, but this use case is, actually, not so common when talking from the developer’s point of view. When dealing with URLs in our code we have to take into account what an URL is and how it should be formatted/encoded to be valid. Therefore, if I’m receiving a wrongly encoded URL I should go to the source code providing this URL and fix the problem there rather than trying to smartly guess which should be the proper URL. For example, in my software currently in development we use Tracker and I rely on it to feed my code with properly formatted URLs. If for some reason Tracker gives me a wrongly encoded one, the place for solving it is, actually, Tracker, and not my software. I should not and must not interpret what Tracker may have wanted to pass me, but open a bug in its bugzilla and provide as accurate information as I can to help them solve this issue.
Just so my friend Iván Frade doesn’t kill me, make notice that Tracker is, so far, perfectly dealing with URLs 🙂