I noticed the other day that a popular ruby library for all things money, Money
, formats Serbian Dinar incorrectly. I spent several evenings to learn and describe what the correct formatting would be and how to achieve that. The answer requires croudsourcing, which you probably figured already.
TL;DR:
- Unicode categorised it all;
- ECMA 402 contains lots of wisdom;
- if you make a money formatting library, make sure it uses CLDR;
- if you know about a less popular currency or language more than most, check if CLDR is correct.
Money formatting depends on locale
If you ever thought about formatting money, you probably fell into a bunch of traps listed in the falsehoods programmers believe about prices. This list illustrates how countries have different currencies, and how different languages format currencies differently.
One falsehood some popular libraries still implement is to mistake the second statement as “we should format money depending on the currency”. This may or may not be obviously false to you, but I certainly believed that few weeks ago. I was wrong.
Sidenote: falsehoods programmers believe is a genre, go explore it.
Number formatting is a part of orthography which is part of writing which is a representation of a language. Representations of different languages vary, and there could be multiple representations of the same languages. In software we choose writing as part of setting locale.
Locales include a bunch of standards based usually on language and region or writing system. The list of locales is currently defined in Unicode. Things that are included into locales which are not parts of writing systems could be standard paper size, or temperature and other measurements, or words. Things that are part of both are, for example, datetime, numbers, and currencies.
Sidenote: There are 9 locales for the Serbian language, 26 for Arabic, 121 for English, and 739 in total.
Example: same amount of money formatted for different locales:
$12,345.42 # en_US
12.345,42 $ # fr_FR
12 345,42 $ # ru_ru
USD 12,345.42 # es_MX, $ stands for Mexican Peso here
Certainly if there are standards, there are libraries to which you can feed the locale and a price, and they will format it for you. How do they know how to do that and avoid The Falsehoods?
The answer is croudsourcing and curation.
There is a good dataset on locales
Unicode is the organization that once standardised text encoding to UTF-8 and adds new emojis into it for the last 10 years. It’s also the organization that builds the Common Language Data Repository.
Common language data is many things. I suggest that you explore it. From the money formatting perspective, they have
- names and symbols for different currencies in different languages,
- how to write a number using words,
- how to round a currency,
- the format itself.
Sidenote: datasets I looked at are cldr-numbers-full
and cldr-code -> supplemental -> currencyData
.
That’s a lot of data not many people know about if you ask them on the street. Unicode uses help of many people, performs surveys, and has an Jira that everyone can a ticket in if they noticed something wrong. Things change across the world and the dataset is not perfect, but it’s the closest to the perfect I’ve met and everyone should use this data.
If you checked your language there and found a mistake, go complain on their Jira.
There is a good spec for money formatting
ECMA 402 spec is part of javascript specification that defines all things internationalisation, including number and currency formatting.
You can see the implementation of this spec in your browser of choice,
Intl.NumberFormat("sr-el", { style: "currency", currency: "RSD" }).format(12345.42)
// => "12.345,42 RSD"
That implementation is hopefully based on the unicode CLDR dataset. Chrome, Safari and Firefox do that for sure, but maybe not every browser uses the latest data, and probably browsers post-process it differently, because people do see the difference.
The spec suggests a vast number of options for the NumberFormat
which clearly comes from facing the reality of many languages and areas where numbers are formatted. Look at the possible rounding modes:
ceil, floor, expand, trunc, halfCeil, halfFloor, halfExpand, halfTrunc, halfEven
. If I were making an i18n library, I would definitely implement this spec.
I18n libraries are more than number formatting
When you think about an i18n library, you’re thinking about words first, and words are usually translated during app development, and the library work is usually to provide a convenient helper that takes a translation key, figure out the locale and return a value for the key.
More mature ecosystems figured common patterns across the apps, put them into frameworks and components, and translated them. For example, Ruby on Rails has a whole set of translations that come with the framework, and more translations come from libraries. At this point people who are not aware of CLDR or started before it became a thing (which was in 2003) start to duplicate its work.
Please don’t. It’s a huge amount of work, people refine it for 21 years now. Another thing, you’ll be put in a position to choose between your contributors data and CLDR data one day. Instead, try to make a mapping between your translation keys and CLDR data and automatically update your app based on that. If you already have conflicts, try to resolve them with help of native speakers and either just override your translations or contribute to CLDR.
Knowing all that, let’s get back to the thing that made me learn it and help with what I can.
The Internet is wrong about money formatting in Serbian
So, ruby Money
gem formats Serbian Dinar incorrectly.
> Money.from_amount(12345.42, "RSD").format
=> "РСД12,345.42"
First of, it formats based on currency instead of locale. To be fair, it throws a warning about that and says that it’s going to move to locale-based formatting from the next major version. That’s a good and right move.
They are still going to support the old way of formatting though, so let’s see what is wrong with that and fix it.
The correct formatting would be 12.345,42 RSD
. Differences are:
- Currency symbol is wrong (more on that later)
- Currency symbol should go after the the number
- There should be a space between the number and the symbol
- Comma is used as a decimal, and dot is used as a thousands separator
How do I know that? I asked. I also googled but wasn’t convinced with the results:
Additionally, on the unicode dataset I noticed that according to it RSD has 0 digits in the fractional part, when it’s actually two, but there are no para (Serbian cents) coins minted anymore so if you pay in cash, it’s rounded.
Clearly, I needed an autority. So I wrote to the People’s Bank of Serbia. And to the Institute of Serbian language. And to my friend philologist Borislava. Two out of three answered.
So here is the thing. Serbian has 9 locales, and even if we narrow down to only two it’s confusing enough. The two are sr-el
(latin) and sr-ec
(cyrillic). If there is a Serbian language in a library it’s usually the cyrillic one.
Serbian dinar ISO code is RSD
. Your best bet outside of Serbia for it to be recognized is RSD. At the same time, cyrillic text would use the cyrillic notation РСД
. Then there are symbols that are non-standard but people use anyway. I found 4 for each writing: DIN, Din, Din., din., ДИН, Дин, Дин., дин.
. Currency symbol goes after the number.
Serbian orthography for number formatting is described in a special book and extract from it are posted across the Internet. Thanks to all these fine people because it’s very inconvenient to grep books.
How to format numbers in Serbian: 12.345,42
.
Rounding of dinars depends on if you’re performing the operation in cash or digitally. There are no para coins, so the bill is rounded to the nearest dinar: 0.50 -> 0, 0.51 -> 1
. This rounding method is called Half Down and corresponds to halfFloor
in ECMA 402. For cashless operations, precision is 0.01 RSD
. Bank of Serbia told me that, but I can confirm that from the experience of living in Serbia. Also, this is where the unicode dataset is slightly wrong, and I reported it to their Jira.
All in all, here are the PRs I created so far:
Call to Action
If you’re making an i18n library, thank you so much. It’s one of the things that requires effort of many people and curation from someone with a lot of patience. Some of the things that can be automated like numbers and dates formatting is done in CLDR and I suggest that you use this data, unless you already do. If there is a conflicting data, please resolve the conflicts and contribute to CLDR.
If you use a library that can format money, please check that it does that based on locales, and please check that Serbian formatting is fine. If it’s not, please complain in their github, and maybe it means that they don’t use CLDR and you could complain about that as well.
Finally, if you live in a small country or speak a less popular language, go check if CLDR is correct about it.
Thanks everyone.
Sources:
- Unicode CLDR
- Spelling and punctuation are not part of grammar
- Falsehoods programmers believe about prices
- Wikipedia about locales
- The list of locales
- ECMA 402
- How to format numbers in Serbian
This post on hackernews: https://news.ycombinator.com/item?id=43601224