[db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Previous message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Next message (by thread): [db-wg] Proceeding with NWI-4
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Edward Shryane
eshryane at ripe.net
Tue May 28 18:13:32 CEST 2024
Dear colleagues, It was pointed out that the ARIN example: whois -h whois.arin.net POC SHRYA12-ARIN is not correct, and should read: whois -h whois.arin.net "p SHRYA12-ARIN" (I used "POC" instead of "p" and that could either cause "POC" to be additionally returned, or no objects at all, depending on your whois client). Apologies, Ed Shryane RIPE NCC > On 28 May 2024, at 11:27, Edward Shryane <eshryane at ripe.net> wrote: > > Dear colleagues, > > There was a question about UTF-8 support by major Whois providers during last week's DB-WG session at RIPE88. > > During the UTF-8 discussion in December I checked the other RIRs as follows: > > LACNIC: only Latin-1 encoded characters are accepted in updates (UTF-8 is ignored) but UTF-8 is returned on port 43. > Example: whois -h whois.lacnic.net PAP12 > APNIC: only Latin-1 is returned > Example: whois -h testwhois.apnic.net YYYYMMDD-MNT > > Subsequently I tested the other RIRs to be sure: > > ARIN: UTF-8 is supported in the RPSL object and UTF-8 is returned on port 43. > Example: whois -h whois.arin.net POC SHRYA12-ARIN > AFRINIC: UTF-8 characters are accepted in updates and UTF-8 is returned on port 43. > Example: whois -h whois.afrinic.net SHRYANE-MNT > > RIPE stores Latin-1 and returns Latin-1 on port 43. > > So in summary, 3 RIRs return UTF-8 and 2 RIRs return Latin-1 on port 43. > > Regards > Ed Shryane > RIPE NCC > > > >> On 2 May 2024, at 16:02, Edward Shryane <eshryane at ripe.net> wrote: >> >> Dear colleagues, >> >> To follow-up on the UTF-8 discusssion in January, the DB team plans to implement support for UTF-8 in 3 phases: >> >> (1) Add a flag to allow a client to choose a character set >> >> In the Whois release 1.112, we have added the "-Z / --charset" query flag to allow clients to specify which character set they expect. The server response will encode RPSL objects using that character set. >> >> This new flag can already be tested in the RC environment, e.g. the SHRYANE-MNT object contains "remarks:" attributes with non-ASCII (but still latin-1) characters: >> >> $ whois -h whois-rc.ripe.net -r shryane-mnt >> $ whois -h whois-rc.ripe.net -r -Z utf8 shryane-mnt >> >> This flag has no impact on the default behaviour of the RIPE database. This change only affects port 43, and the default character set remains latin-1. >> >> This flag will already be useful for example, to capture responses as UTF-8 to file or use UTF-8 encoding in your terminal. In future, if the default on port 43 changes to UTF-8, then clients can keep latin-1 by using "-Z/--charset latin1". >> >> (2) Convert the database schema to UTF-8 >> >> In the following Whois release, the DB team plans to switch the RIPE database schema character set from latin-1 to UTF-8. This will allow Whois to store UTF-8 strings in the database index tables. >> >> Switching the database schema character set will involve about 1 hour of downtime to Whois updates, and Whois queries will not be affected. We will announce this change in advance. >> >> This change will have no impact on the default behaviour of the RIPE database. All interfaces will behave as before, and RPSL objects will remain latin-1 encoded internally. >> >> (3) Allow UTF-8 to be used in RPSL objects >> >> Once the RIPE database schema supports the UTF-8 character set, the DB team will create a further Whois release that will allow UTF-8 to be used in RPSL objects, in addition to the index tables. >> >> The default behaviour of the RIPE database will remain the same. All interfaces will behave as before, but RPSL objects will use UTF-8 internally. >> >> In future, if the DB-WG decides to allow UTF-8 characters in RPSL, the database will already support it. >> >> Regards >> Ed Shryane >> RIPE NCC >> >> >>> On 18 Jan 2024, at 10:34, Edward Shryane <eshryane at ripe.net> wrote: >>> >>> Dear colleagues, >>> >>> Based on the discussion regarding UTF-8 in the RIPE database during the interim meeting yesterday, I suggest that we implement support for UTF-8 in the database (i.e. convert the schema and add a flag to allow a client to choose a character set), but we do not allow additional characters for now, pending further DB-WG discussion. Our intention is to lay the groundwork for future support, without breaking existing functionality. If you have any concerns or objections please let me know. >>> >>> We will now prepare an implementation plan / impact analysis of these changes. >>> >>> Regards >>> Ed Shryane >>> RIPE NCC >>> >>> >>>> On 24 Nov 2023, at 10:03, Edward Shryane via db-wg <db-wg at ripe.net> wrote: >>>> >>>> Dear colleagues, >>>> >>>> Currently the RIPE database only allows a subset of ASCII characters in the "org-name:", "person:" and "role:" attributes, for a few reasons including: >>>> >>>> * These attributes are also a look-up key and the Whois protocol does not allow specifying character sets in queries. >>>> * RPSL names are ASCII according to RFC2622 >>>> * Using a normalised name makes the object easier to query >>>> * Reading a normalised name is easier to interpret >>>> >>>> However there are some drawbacks to forcing names to only use a subset of ASCII characters: >>>> >>>> * Organisations, roles and persons cannot use their actual name if it includes characters outside this subset. >>>> * Normalisation is not standard, but is an interpretation done by each maintainer, e.g. characters could be excluded or converted in different ways. >>>> >>>> Since we support the Latin-1 character set in the RIPE database, I propose we also allow non-ASCII Latin-1 characters in these attributes. >>>> >>>> Querying for a name can be done either using the latin-1 characters (proposed) or a normalised, ASCII representation (currently). The normalised version will be generated by Whois and stored in a database index for querying. The primary key will also be generated from the normalised version. >>>> >>>> Please let me know your feedback. >>>> >>>> Regards >>>> Ed Shryane >>>> RIPE NCC >>>> >>>> --- >>>> >>>> Whois attribute verbose description (copied from the help text). >>>> >>>> org-name >>>> -------- >>>> Specifies the name of the organisation that this organisation object >>>> represents in the RIPE Database. This is an ASCII-only text attribute. >>>> The restriction is because this attribute is a look-up key and the >>>> whois protocol does not allow specifying character sets in queries. >>>> The user can put the name of the organisation in non-ASCII character >>>> sets in the "descr:" attribute if required. >>>> >>>> A list of 1 to 30 words separated by white space. >>>> A word is made up of ASCII alphanumeric characters and additionally: ][)(._"*@,&:!'`+/- >>>> A word may have up to 64 characters and is not case sensitive. >>>> Each word can have any combination of the above characters with no restriction on the start or end of a word. >>>> >>>> person >>>> ------ >>>> Specifies the full name of an administrative, technical or zone >>>> contact person for other objects in the database. >>>> >>>> It should contain 2 to 10 words. >>>> A word is made up of ASCII alphanumeric characters and additionally: .`'_- >>>> The first word should begin with a letter. >>>> At least one other word should also begin with a letter. >>>> Max 64 characters can be used in each word. >>>> >>>> role >>>> ---- >>>> Specifies the full name of a role entity, e.g. RIPE DBM. >>>> >>>> A list of 1 to 30 words separated by white space. >>>> A word is made up of ASCII alphanumeric characters and additionally: ][)(._"*@,&:!'`+/- >>>> A word may have up to 64 characters and is not case sensitive. >>>> Each word can have any combination of the above characters with no restriction on the start or end of a word. >>>> >>>> >>>> -- >>>> >>>> To unsubscribe from this mailing list, get a password reminder, or change your subscription options, please visit: https://lists.ripe.net/mailman/listinfo/db-wg >>> >> >
- Previous message (by thread): [db-wg] Proposal to allow non-ASCII characters in "org-name:", "person:" and "role:" attributes
- Next message (by thread): [db-wg] Proceeding with NWI-4
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[ db-wg Archives ]