Wikifunctions:Status updates/2024-12-19
◀ | Wikifunctions Status updates | ▶ |
Function of the Week: age
Last week we introduced the Gregorian date type, and as of the time of writing, we have 23 functions using the new type. Thanks everyone for your contributions!
One of the flagship functions for Wikifunctions that we have mentioned in presentations and essays before is the age function. This function takes two dates as the argument and calculates the difference between them. The first argument could be, for example, the date of birth of a person or the date an organization was founded. The function would then calculate the age in full years of the person or organization as of the date given in the second argument.
For example, Wikipedia was founded on 15 January 2001. At the day of publication of this newsletter, Wikipedia was 23 years old. The age function would tell you that natural number as an answer.
Why did we choose it as a flagship function? Because more than 160 Wikimedia projects have a template for this functionality, and more than 100 projects have a module for this functionality. But in many cases, these templates and modules are copy and pasted from another project, underdocumented, not well tested, and almost never updated if the original has improved. And, as often as these templates and modules have been copied all around, there are still more than 500 Wikimedia projects that don’t have access to that functionality.
One goal of Wikifunctions is to provide such functionality from a central repository: all projects should have automatic access to this functionality, in its most up-to-date form, well-tested, both through explicit function tests and through usage across many projects. No more copy-and-pasting from other wikis, no more content that the local community barely understands and has difficulty maintaining.
And now, on Wikifunctions, we have the age function (Z20756). It currently has four implementations and five tests. Since we have not yet configured a parser for dates, entering the arguments is a bit of a hassle. Nevertheless, I chose this function as the last Function of the Week for this year, to use the opportunity to highlight part of what Wikifunctions may mean for Wikipedia in the future.
The five tests are:
- someone born on Christmas last year will be 1 on Christmas this year
- someone born on Christmas 4 AD will be 2020 on Christmas this year
- someone born on Christmas last year will still be 0 on Christmas eve this year
- someone born on 23 January 12 BC was 10 on 1 March 2 BC
- someone born on 1 January 5 BC was 37 on 3 April 33 AD
The tests cover a good range. In particular, the tests across the era change are important. It would be interesting to have agreements and tests for what happens when the first date is after the second, and tests for dates outside of JavaScript’s date range (the far future or past, more than 300,000 years away), but, other than that, the test coverage seems good.
The four current implementations are:
- A composition that counts the days between the two dates, and divides them by 365 (which fails to account for leap years)
- Another composition that subtracts the first year from the second, and subtracts one more number in case the second date is earlier in the year than the first. It does so by cleverly using a conversion of the condition to a number. This composition fails at time of writing, though.
- A third composition that does the same, but using integers instead of natural numbers. This one works, but could benefit from using a few more high-level functions.
- An implementation in JavaScript that does basically the same: it subtracts the years, and if the date is later in the year, it subtracts one more.
(Note: the first two compositions have been deleted since this text was written)
The implementations are interesting (also because of the first one, which is intended to fail in order to highlight the relevance of some of the test cases).
Call for Functions: Intros for year articles
The main goal of Wikifunctions is to support Abstract Wikipedia – meaningful, Wikipedia-style paragraphs and articles generated from data and abstract content. For that, we need to be able to create high-quality prose content for articles in many languages.
Many Wikipedias have a set of articles about individual years — for example, here is the article for the year 2023 in English Wikipedia. In most languages, the article starts with a few sentences with very similar content to what the English Wikipedia offers:
“2023 (MMXXIII) was a common year starting on Sunday of the Gregorian calendar, the 2023rd year of the Common Era (CE) and Anno Domini (AD) designations, the 23rd year of the 3rd millennium and the 21st century, and the 4th year of the 2020s decade.”
There is now a function that can create a text very similar to this one, but without the links and formatting: Intro for year in English, which creates the following text:
“2023 (MMXXIII) was a common year starting on Sunday of the Gregorian calendar, the 2023rd year of the Common Era (CE) and Anno Domini (AD) designations, the 23rd year of the 3rd millennium, the 23rd year of the 21st century, and the 4th year of the 2020s decade.”
This is currently only available in English, and it has fewer features than what English Wikipedia uses (for example, it doesn’t switch to the Julian calendar, it doesn’t unify the counting of the years in a century and a millennium if it is the same, etc.). It would be interesting to create similar functions for other languages as well, and so we are calling for functions to be written over the holidays, and will take stock in the beginning of the Gregorian calendar year 2025.
Another interesting task – and that would be medium term – would be to work on a function that is abstract, i.e. which creates the right words for the given languages, rather than the community on Wikifunctions having to hard-code in each language. This would currently still be difficult, but by the end of the next quarter we should be able to get “decade” or “century” from Wikidata in many different languages, which will help us get there.
There are also three main caveats for the current work:
- The Wikifunctions system is timing out on larger compositions. Although we have improved the performance of our system, we still are prone to timeouts in larger compositions.
- There is a missing feature that keeps us from getting the label of an object.
- Admittedly, even if we had the label, that would not be sufficient in many languages, as we would rather need the Lexeme in order to get the appropriate inflection.
So as we can see, we are very close to being able to build functions that can generate texts for years, but there are a few blockers on our way. We will use these blockers in the coming months to make progress visible and to focus our development in order to enable these to work – and not only on Wikifunctions, but on Wikipedia as well.
I hope these thoughts serve both as a reflection on what we achieved this year, as well as on where we want to go next year.
Recent Changes in the software
This week is the last production release before the end of 2024, as Wikimedia has a End-of-Year release freeze, so that we don't deploy code when lots of people are away and unavailable. The next production release after this will be around 15 January 2025.
We've fixed the database code behind some of our special pages to not list Objects with talk pages twice – thanks Feeglgeef for reporting this (T381003). We've landed some preparatory database work that will in future allow us to list Functions that use particular Types, so you can find examples of how others have used them; expect this some time next calendar year (T301712).
We've adjusted the logic when loading content from the database so that it throws a clearer, more MediaWiki-standard error when somehow something invalid has been saved into the wiki (T381115). We've also added some better testing for invalid Z2K1 values, stopped the API hiding such invalid items, and fixed a couple of issues that meant this kind of broken content was challenging to fix (T381972). In another area, we've guarded against odd errors triggered from invalid content when pages are re-rendered, to avoid filling up production logs with confusing warnings going to the wrong people (T380446).
We've split one re-used i18n message so that it's possible to translate it properly (T373745), and deleted two old, now un-used ones to avoid wasted translator effort – sorry for that!
On the developer side, we've upgraded the version of JSDoc used to generate our (rather limited) front-end JS docs, and the phan static analyser of our PHP, alongside all Wikimedia repos switching to the newer versions. We've also made an error from one of our test tools more clear, as part of preparing for updating our tests to cover a more modern version of our front-end framework.
We have added support for the Z1956/fvr language to Wikifunctions, as part of it being added to MediaWiki (T381894).
As always, please alert us if you run into any issues.
News in Types: Double-precision floating-point numbers
This week we are introducing the double-precision floating-point number type, also known as "float64" among friends, or simply "float". Unlike the other number types that we already have – natural numbers, integers, and rational numbers – floats are not necessarily precise. Instead, they are a compromise between precision, feasibility and efficiency, which has been codified in a standard almost forty years ago.
The 64 in float64 indicates that a floating-point number needs 64 bits in most programming languages. This is called the double-precision floating-point number: single precision takes 32 bits, and half-precision 16. I am tempted to tell you so much more about floating point numbers, and all the cool features they have, but instead I will just point to the English Wikipedia article as a starting point.
What are floating-point numbers good for in Wikifunctions? We will need to see how the different number types work out. For many calculations, I expect us to prefer the precision that rational numbers offer. But the most pragmatic approach to deal with irrational numbers is to use the approximation that floating-points offer. Whether we’re dealing with roots, circles, sinus waves, or logarithms, they will often be difficult or impossible to calculate with the numbers we already have, and a new type that balances approximation with precision is now available to deal with that issue.
One huge advantage of floating-point numbers is that they are standardized and the standard is widely implemented in hardware and available in many programming languages.
Floating-point numbers will potentially nudge us to find new patterns in how to write tests. Often exact precision is counter-productive (one classical example is that in floating-point arithmetics, 0.1+0.2 is not equal to 0.3), and for some functions we might want to write tests that don’t rely on exact equality, but rather on checking that the result is close enough to the expected value. And that will be a pattern that will be useful to have for the more complex and interesting types that await us in the future.
Newsletter taking a break
The next few days, most of the team will take off due to the holiday season at the end of the year. Expect the next update in the Week of 15 January 2025. The first Volunteers' Corner of the next year will be on 13 January 2025. We wish everyone peaceful days and see you in the new Gregorian calendar year!