This archive is retained to ensure existing URLs remain functional. It will not contain any emails sent to this mailing list after July 1, 2024. For all messages, including those sent before and after this date, please visit the new location of the archive at https://mailman.ripe.net/archives/list/db-wg@ripe.net/

[db-wg] UTF8

Previous message (by thread): [db-wg] db-wg Digest, Vol 44, Issue 7
Next message (by thread): [db-wg] UTF8

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

denis walker ripedenis at yahoo.co.uk
Wed May 6 11:13:28 CEST 2015

Hi Piotr
Thanks for the clarification. I don't think it makes sense to restrict the UTF8 to only character sets defined within the RIPE region. (Not sure it is even technically possible.) But if a Chinese person lives and works in this region why would they not be able to enter their correct name? Just for arguments sake, changing my name into Chinese with Google translate changes the space to a '.'. If that is correct then the current syntax check fails.
Also "person:", "role:" and "org-name:" are all defined as 'lookup keys'. That means you can enter their values in a query as the query string and that will be searched on in the database. The individual 'words' from these attribute values are stored in index tables in the database and searched as part of the query to return objects with matching values. I believe it is problematic to do string comparison in UTF8.
Also the Full Text Search allows searches on all these attributes as well as "address:", "descr:" and "remarks:". Again all the component parts of these values are indexed for this search.
So to allow any attribute in UTF8 only, may require software changes and may put restrictions on some of the services the database currently provides. If you cannot rely on a search returning the correct objects then you cannot allow those searches.

There was a Labs article written some time ago on UTF8https://labs.ripe.net/Members/kranjbar/internationalisation-of-ripe-database
This article put forward the idea of keeping all existing attributes in ASCII (but really meant Latin1) and allowing additional optional attributes for name and contact details in local language. I think that would be a good first step to provide additional benefits of localisation without breaking any of the current functionality. Even if it was only an interim step it would allow time to asses any issues and monitor the usefulness of these new attributes.
cheersDenis WalkerIndependent Netizen

On 06/05/2015 09:56, Piotr Strzyzewski wrote:

On Fri, May 01, 2015 at 01:53:27PM +0000, denis walker wrote:

Dear Denis

Thanks for your valuable input.

Just to be clear, you refer to free text attributes. This has a
specific meaning in terms of database syntax checks. It applies to
those attributes where no syntax checks are done, for example
"address:", "descr:", "remarks:". Is your proposal only referring to
these attributes? I trust you do not mean all attributes other than

I have deliberately used the "free text" characteristic instead of
<freeform> grammar element used in RIPE Database Documentation.

So, to be clear - yes, I meant also "person:", "role:" and "org-name:".

primary keys. Incidentally, although "person:", "role:" and
"org-name:" are not primary keys, they are not free text either.

Taking above into account one can observe that according to the RIPE
Database Documentation "person:" attribute is somehow less restricted
than "address:", "descr:" and "remarks:" attributes (limited to Latin1) ;-)

In contrast to <role-name> and <organisation-name> which use the
"alphanumeric characters" characteristic, the <person-name> use the
"letter" one. And since "letter" is not defined anywhere, my
understanding of this word _could_ be different than yours. ;-)

Currently there are syntax checks done on these values. If you allow
these in UTF8 then all these syntax checks will have to be dropped.

I disagree that all of them will have to be dropped. For example, the
attribute length or number of words separated by space is quite
independent from the character set.

Moreover, we can restrict UTF8 in attributes which are not defined as
<freeform> at this moment, to include only those subsets of UTF8 which
covers alphabets used in RIPE NCC service region.

I'm open to discuss this.

Best regards,
Piotr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </ripe/mail/archives/db-wg/attachments/20150506/daa7fb6d/attachment.html>

Previous message (by thread): [db-wg] db-wg Digest, Vol 44, Issue 7
Next message (by thread): [db-wg] UTF8

Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[ db-wg Archives ]