Skip to content

Using URLDecoder to decode URL-format paths is not safe unless paths have been encoded with URLEncoder.

Summary

If the instance (workspace), configuration or installation area or working directory of a built product contain a '+' at any point in the path, it is likely that at some point this problem will be encountered.

Currently the latest example of this is an installation that was located at C:\SysdynVensim_3+Simupedia and trying to run the product produced

java.lang.IllegalArgumentException: Invalid path for FastLZ library: C:\SysdynVensim_3 Simupedia\simantics-sysdyn\configuration\org.eclipse.osgi\324\0.cp\fastlz-windows-x86_64.dll

This particular problem case occurred with the configuration area.

Details

URLEncoder/Decoder work according to RFC 2396, which states:

        /* The list of characters that are not encoded has been
         * determined as follows:
         *
         * RFC 2396 states:
         * -----
         * Data characters that are allowed in a URI but do not have a
         * reserved purpose are called unreserved.  These include upper
         * and lower case letters, decimal digits, and a limited set of
         * punctuation marks and symbols.
         *
         * unreserved  = alphanum | mark
         *
         * mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
         *
         * Unreserved characters can be escaped without changing the
         * semantics of the URI, but this should not be done unless the
         * URI is being used in a context that does not allow the
         * unescaped character to appear.
         * -----
         *
         * It appears that both Netscape and Internet Explorer escape
         * all special characters from this list with the exception
         * of "-", "_", ".", "*". While it is not clear why they are
         * escaping the other characters, perhaps it is safest to
         * assume that there might be contexts in which the others
         * are unsafe if not escaped. Therefore, we will use the same
         * list. It is also noteworthy that this is consistent with
         * O'Reilly's "HTML: The Definitive Guide" (page 164).
         *
         * As a last note, Intenet Explorer does not encode the "@"
         * character which is clearly not unreserved according to the
         * RFC. We are being consistent with the RFC in this matter,
         * as is Netscape.
         *
         */

URLEncoder does the following:

            if (dontNeedEncoding.get(c)) {
                if (c == ' ') {
                    c = '+';
                    needToChange = true;
                }
                //System.out.println("Storing: " + c);
                out.append((char)c);
                i++;
            } else {

and URLDecoder does the opposite - i.e. it encodes whitespace ' ' as '+' and decoder converts '+' back to ' '.

We use URLDecoder in tens of places to decode URL-encoded paths that have not been encoded with URLEncoder but paths coming from e.g. Eclipse's FileLocator which do not have the exact same encoding.

I'm unsure how to guarantee a backwards compatible fix for this. Needs more thought.

Edited by Tuukka Lehtonen