This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/db-wg@ripe.net/
[db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Previous message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Next message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Edward Shryane
eshryane at ripe.net
Fri Nov 24 10:42:11 CET 2023
Hi Job, > On 24 Nov 2023, at 10:21, Job Snijders <job at fastly.com> wrote: > > Dear Edward, > > On Fri, Nov 24, 2023 at 10:03:15AM +0100, Edward Shryane via db-wg wrote: >> Currently the RIPE database only allows a subset of ASCII characters >> in the "org-name:", "person:" and "role:" attributes, for a few >> reasons including: >> >> * These attributes are also a look-up key and the Whois protocol does >> not allow specifying character sets in queries. >> * RPSL names are ASCII according to RFC2622 >> * Using a normalised name makes the object easier to query >> * Reading a normalised name is easier to interpret >> >> However there are some drawbacks to forcing names to only use a subset >> of ASCII characters: >> >> * Organisations, roles and persons cannot use their actual name if it >> includes characters outside this subset. >> * Normalisation is not standard, but is an interpretation done by each >> maintainer, e.g. characters could be excluded or converted in >> different ways. > > The above two points are key in making the RIPE database useful and > accessible to everyone, I too would love to see those points addressed. > >> Since we support the Latin-1 character set in the RIPE database, I >> propose we also allow non-ASCII Latin-1 characters in these >> attributes. >> >> Querying for a name can be done either using the latin-1 characters >> (proposed) or a normalised, ASCII representation (currently). The >> normalised version will be generated by Whois and stored in a database >> index for querying. The primary key will also be generated from the >> normalised version. >> >> Please let me know your feedback. > > Wouldn't it be an opportune time to support UTF-8 instead of LATIN-1? > As I understand it, through the use of UTF-8 more languages could be > supported. UTF-8 seems to be the preferred character encoding in any new > IETF work (for good reason). > I wrote an impact analysis on UTF-8 in the RIPE database last year: https://labs.ripe.net/author/ed_shryane/impact-analysis-for-utf-8-in-the-ripe-database/ We already support UTF-8 in the Whois REST API and on the website, but convert to/from latin-1 in the database. Switching to UTF-8 in the database is not technically difficult, but we need functional requirements from the community on where to allow UTF-8 characters. This proposal is only to support more Latin-1 characters to be supported in names, while preserving backwards compatibility for querying (by also doing normalisation to ASCII). > Have the effects of LATIN-1 on downstream applications such as NRTM v3 > and NRTM v4 been considered? Allowing Latin-1 in these name attributes *does* impact NRTMv3 and NRTMv4 (as they will no longer be ASCII only), but these characters are already allowed elsewhere in RPSL (e.g. the workaround of putting the correct name in the "descr:" attribute). Also the object primary key will remain ASCII. > > You indicate that LATIN-1 already is supported in the RIPE database, so > I imagine you and the team already deliberated on the pro's and con's of > UTF-8 vs LATIN-1; and as such concluded with this particular > recommendation. I just wanted to make sure to raise these questions. :-) > We can switch to UTF-8, this proposal allows more characters in those attributes without needing to change the database character set. > Some interesting reading material on UTF-8 https://utf8everywhere.org/ > > Kind regards, > > Job Regards Ed Shryane RIPE NCC
- Previous message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Next message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[ db-wg Archives ]