{"id":176,"date":"2011-07-19T18:11:35","date_gmt":"2011-07-19T17:11:35","guid":{"rendered":"http:\/\/blog.andresgomez.org\/?p=176"},"modified":"2012-07-26T02:03:00","modified_gmt":"2012-07-26T01:03:00","slug":"qurl-misusage","status":"publish","type":"post","link":"https:\/\/blog.andresgomez.org\/es\/2011\/07\/19\/qurl-misusage\/","title":{"rendered":"QUrl (mis)usage"},"content":{"rendered":"<p>Lately, I&#8217;ve been developing some software which makes an intensive usage of QUrls as resource locators for local files. Nothing wrong here. QUrl is a powerful way of sharing the locations of those in an universal way. The problem is when you construct those QUrls from QStrings and you actually forget that QUrls are meant for much more than representing local file locations.<\/p>\n<div style=\"width: 410px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" title=\"Authority chunks on an URL\" src=\"http:\/\/doc.qt.nokia.com\/4.7\/images\/qurl-authority.png\" alt=\"Authority chunks on an URL\" width=\"400\" height=\"176\" \/><p class=\"wp-caption-text\">Authority chunks on an URL<\/p><\/div>\n<p>At the moment of writing this, QUrl <a href=\"http:\/\/doc.qt.nokia.com\/4.7\/qurl.html\">documentation<\/a>, although quite complete, could be much more explanatory. For example, it says that the recommended way for creating a QUrl from a QString is:<\/p>\n<p style=\"font-size: x-small;\"><code><br \/>\n* When creating an URL QString from a QByteArray or a char*,<br \/>\nalways use QString::fromUtf8().<br \/>\n* Favor the use of QUrl::fromEncoded() and QUrl::toEncoded()<br \/>\ninstead of QUrl(string) and QUrl::toString() when converting<br \/>\nQUrl to\/from string.<br \/>\n<\/code><\/p>\n<p>But this is explained in the documentation for QUrl::fromUserInput(), instead of in the Detailed Description [ <a href=\"http:\/\/bugreports.qt.nokia.com\/browse\/QTBUG-20411\">1<\/a> ].<\/p>\n<p>What is important from this explanation is that it is not a matter of favor the use of QUrl::from\/toEncoded() over QUrl::(from)toString() but, I would say, a must if you don&#8217;t want to end up with bogus corner cases.<\/p>\n<p>Why would this happen? Well, as I was saying, QUrl is meant for much more than universally representing the location of a file so, here go the <strong>big tips<\/strong>:<\/p>\n<ol>\n<li>If you want to get the QUrl from a local file represented with a QString, use <strong>always<\/strong> QUrl::fromLocalFile ( const QString &amp; localFile ) . Don&#8217;t use QUrl::QUrl ( const QString &amp; url ) if you don&#8217;t want to end up with some problems. In the same way, get the path to the local file <strong>always<\/strong> with QUrl::toLocalFile().<\/li>\n<li>If you want to get a QUrl from a QString representing an URL, be sure that the QString is actually representing a <strong>percent encoded<\/strong> URL, as it should to be a valid URL, and <strong>always<\/strong> use QUrl::fromEncoded ( const QByteArray &amp; input, ParsingMode parsingMode ), with <strong>QUrl::StrictMode<\/strong> as the QUrl::ParsingMode.<\/li>\n<li>If you want to get a QString representation of an URL from a QUrl use <strong>always<\/strong> QUrl::toEncoded ().<\/li>\n<\/ol>\n<h3>Bogus examples for each case:<\/h3>\n<h4>Local file<\/h4>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my#file.jpg<\/code><\/p>\n<p style=\"color: green;\">Correct:<\/p>\n<p style=\"font-size: x-small;\"><code>QUrl myUrl = QUrl::fromLocalFile(\"\/mypath\/my#file.jpg\")<\/code><\/p>\n<p style=\"color: red;\">Incorrect:<\/p>\n<p style=\"font-size: x-small;\"><code>QUrl myUrl = QUrl(\"file:\/\/\/mypath\/my#file.jpg\")<\/code><\/p>\n<p>The problem here is the way QUrl will treat the \u00ab#\u00bb character in the second example. It will think, as it actually doesn&#8217;t have a way of guessing, that the character is delimiting the fragment part of the URL.<\/p>\n<div style=\"width: 410px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" title=\"Fragment part on an URL\" src=\"http:\/\/doc.qt.nokia.com\/4.7\/images\/qurl-fragment.png\" alt=\"Fragment part on an URL\" width=\"400\" height=\"63\" \/><p class=\"wp-caption-text\">Fragment part on an URL<\/p><\/div>\n<p>As a result, calling to:<\/p>\n<p style=\"font-size: x-small;\"><code>myUrl.toLocalFile()<\/code><\/p>\n<p>in the first case will result to:<\/p>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my#file.jpg<\/code><\/p>\n<p>while in the second will be:<\/p>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my<\/code><\/p>\n<h4>Parsing mode<\/h4>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my#file.jpg<\/code><\/p>\n<p>(encoded) url representation:<\/p>\n<p style=\"font-size: x-small;\"><code>file:\/\/\/mypath\/my%23file.jpg<\/code><\/p>\n<p style=\"color: green;\">Correct:<\/p>\n<p style=\"font-size: x-small;\"><code>QUrl myUrl = QUrl::fromEncoded(\"file:\/\/\/mypath\/my%23file.jpg\", QUrl::StrictMode)<\/code><\/p>\n<p style=\"color: red;\">Incorrect:<\/p>\n<p style=\"font-size: x-small;\"><code>QUrl myUrl = QUrl::fromEncoded(\"file:\/\/\/mypath\/my%23file.jpg\")<\/code><\/p>\n<p>The problem here is the way QUrl will treat the \u00ab%23\u00bb encoding in the second example. Although it is not explicitly explained in the documentation [ <a href=\"http:\/\/bugreports.qt.nokia.com\/browse\/QTBUG-20399\">2<\/a> ], QUrl will use QUrl::TolerantMode as ParsingMode by default. Therefore, it will think that the input comes from an ignorant user which was actually trying to pass \u00abfile:\/\/\/mypath\/my#file.jpg\u00bb. Again, it will understand after converting back \u00ab%23\u00bb to \u00ab#\u00bb, that the character is delimiting the fragment part of the URL.<\/p>\n<p>As a result, calling to:<\/p>\n<p style=\"font-size: x-small;\"><code>myUrl.toLocalFile()<\/code><\/p>\n<p>in the first case will result to:<\/p>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my#file.jpg<\/code><\/p>\n<p>while in the second will be:<\/p>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my<\/code><\/p>\n<h4>Encoded usage<\/h4>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my#file.jpg<\/code><\/p>\n<p>(encoded) url representation:<\/p>\n<p style=\"font-size: x-small;\"><code>file:\/\/\/mypath\/my%23file.jpg<\/code><\/p>\n<p>(unencoded and wrong) url representation:<\/p>\n<p style=\"font-size: x-small;\"><code>file:\/\/\/mypath\/my#file.jpg<\/code><\/p>\n<p style=\"color: green;\">Correct:<\/p>\n<p style=\"font-size: x-small;\"><code>QUrl myUrl = QUrl::fromEncoded(\"file:\/\/\/mypath\/my%23file.jpg\", QUrl::StrictMode)<\/code><\/p>\n<p style=\"color: red;\">Incorrect:<\/p>\n<p style=\"font-size: x-small;\"><code>QUrl myUrl = QUrl(\"file:\/\/\/mypath\/my#file.jpg\")<\/code><\/p>\n<p>Here, we have another incarnation of the very same problem than the two examples above. QUrl will think, again, as it actually doesn&#8217;t have a way of guessing, that the character is delimiting the fragment part of the URL.<\/p>\n<p>As a result, calling to:<\/p>\n<p style=\"font-size: x-small;\"><code>myUrl.toLocalFile()<\/code><\/p>\n<p>in the first case will result to:<\/p>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my#file.jpg<\/code><\/p>\n<p>while in the second will be:<\/p>\n<p style=\"font-size: x-small;\"><code>\/mypath\/my<\/code><\/p>\n<h3>Corollary:<\/h3>\n<p>The default behavior of QUrl is to provide an easy handling of URLs to the user of our programs, the <strong>end user<\/strong>, but not the user of QUrl, the developers. I find this quite awkward but, still, it is a decision of Qt people and, as developers, we only have to take this into account when writing our code.<\/p>\n<p>These bogus URLs, which are to be corrected with the usage of the QUrl::TolerantMode ParsingMode, usually come from a text entry box \u00ab\u00e0 l\u00e0\u00bb browser location bar, but this use case is, actually, not so common when talking from the developer&#8217;s point of view. When dealing with URLs in our code we have to take into account what an URL is and how it should be formatted\/encoded to be valid. Therefore, if I&#8217;m receiving a wrongly encoded URL I should go to the source code providing this URL and fix the problem there rather than trying to smartly guess which should be the proper URL. For example, in my software currently in development we use <a href=\"http:\/\/www.tracker-project.org\/\">Tracker<\/a> and I rely on it to feed my code with properly formatted URLs. If for some reason Tracker gives me a wrongly encoded one, the place for solving it is, actually, Tracker, and not my software. I should not and must not interpret what Tracker may have wanted to pass me, but open a bug in its bugzilla and provide as accurate information as I can to help them solve this issue.<\/p>\n<p>Just so my friend <a href=\"https:\/\/dz015.wordpress.com\/\">Iv\u00e1n Frade<\/a> doesn&#8217;t kill me, make notice that Tracker is, so far, perfectly dealing with URLs \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Lately, I&#8217;ve been developing some software which makes an intensive usage of QUrls as resource locators for local files. Nothing wrong here. QUrl is a powerful way of sharing the locations of those in an universal way. The problem is &hellip; <a href=\"https:\/\/blog.andresgomez.org\/es\/2011\/07\/19\/qurl-misusage\/\">Sigue leyendo <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,6,7,9,19,11,20,12,21],"tags":[],"class_list":["post-176","post","type-post","status-publish","format-standard","hentry","category-english","category-free-software","category-general","category-igaliacom","category-linkedin","category-meego","category-mobile","category-planetigaliacom","category-qt"],"_links":{"self":[{"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/posts\/176"}],"collection":[{"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/comments?post=176"}],"version-history":[{"count":3,"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/posts\/176\/revisions"}],"predecessor-version":[{"id":437,"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/posts\/176\/revisions\/437"}],"wp:attachment":[{"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/media?parent=176"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/categories?post=176"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.andresgomez.org\/es\/wp-json\/wp\/v2\/tags?post=176"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}