Commits
- Commit:
ef04b55160759b22db67f14c703a4343c4741e8b
- From:
- Omar Polo <op@omarpolo.com>
- Date:
switch to Bjoern Hoehrmann UTF-8 decoder
It's correct, while my hacked valid_multibyte_utf8 would allow things
that aren't technically UTF8.
- Commit:
f722f3c5aab71f48c1001d2e1c3f1fdf77d4a1ae
- From:
- Omar Polo <op@omarpolo.com>
- Date:
typos
- Commit:
00781742c5578afa15d0b2dbc86adf47870fb94f
- From:
- Omar Polo <op@omarpolo.com>
- Date:
reject %00
- Commit:
df6ca41da36c3f617cbbf3302ab120721ebfcfd2
- From:
- Omar Polo <op@omarpolo.com>
- Date:
IRI support
This extends the URI parser so it supports full IRI (Internationalized
Resource Identifiers, RFC3987). Some areas of it can/may be improved,
but here's a start.
Note: we assume UTF-8 encoded IRI.
- Commit:
33d32d1fd66a577f22f3f33f238e8dac44ec9995
- From:
- Omar Polo <op@omarpolo.com>
- Date:
implement a valid RFC3986 (URI) parser
Up until now I used a "poor man" approach: the uri parser is barely a
parser, it tries to extract the path from the request, with some minor
checking, and that's all. This obviously is not RFC3986-compliant.
The new RFC3986 (URI) parser should be fully compliant. It may accept
some invalid URI, but shouldn't reject or mis-parse valid URI. (in
particular, the rule for the path is way more relaxed in this parser
than it is in the RFC text).
A difference with RFC3986 is that we don't even try to parse the
(optional) userinfo part of a URI: following the Gemini spec we treat
it as an error.
A further caveats is that %2F in the path part of the URI is
indistinguishable from a literal '/': this is NOT conforming, but due
to the scope and use of gmid, I don't see how treat a %2F sequence in
the path (reject the URI?).