Monday, May 28, 2012

The Universal Standard Library

When the revolution comes, the creators of standard libraries are going to have some explaining to do.

Below is the Top 20 from the (mostly meaningless) TIOBE popular programming languages chart, and the name of the function to transform a string to upper case in that language:

Cstrupr
JavatoUpperCase
C++transform (with toupper parameter)
Objective-CuppercaseString
C#ToUpper
PHPstrtoupper
(Visual) BasicUCase
Pythonupper
Perluc
JavaScripttoUpperCase
Rubyupcase
Visual Basic .NETToUpper
PL/SQLUPPER
Delphi/Object Pascaluppercase
Lispstring-upcase (Common Lisp)
Logo (really?)instruct turtle to take bigger strides
Pascaluppercase
Transact-SQLUPPER
AdaTo_Upper
Luaupper

(This list was quickly researched so if I got your favourite language wrong, then please feel free to rip me apart in the comments.)

Even ignoring the capitalization of the function names, it astonishes me that there are very few cases where the name is the same in different languages. I'm the kind of demented person that works in different languages for different jobs and this diversity gets very annoying. It is not unusual for me to work in 3 or 4 different languages in a week and I can tell you that remembering the language syntax is much easier than remembering all of the different names of all of the different common functions.

Now you may think this is a minor gripe but the "upper case" example is obviously only one of hundreds of commonly used functions. The inconsistency is nothing if not consistent.

The Solution

The CSIRO (Australia) has calculated in a 2010 study that the amount of time wasted annually by the estimated six million professional software developers world-wide due to inconsistent standard library naming is equal to the GDP of Portugal [citation needed].

Being the nice guy that I am, I will single handedly solve this issue once and for all by:

  • Creating a new set of namespaces, class names (if needed), function names and type signatures (if needed) that will be known as the "Universal Standard Library Specification" (USLS). The USLS committee will consist of one person - me. All decisions will be final.
  • Standard library implementors will implement the standard. This will be done initially with aliases and wrapper functions. Minor variations due to allowable characters in identifiers (eg. - ? !) will be acceptable upon payment of a nominal fee per variation to the USLS committee.
Of course the USLS project will initially make the inconsistency problem even worse by introducing yet another set of names and polluting the internal consistency of existing code-bases. Making the situation much, much worse to solve the problem is unfortunately unavoidable during the transition period. The USLS committee expects the transition period to last for no more than 30 - 50 years before the USLS becomes commonplace.

If you support my project please leave feedback in the comments. If the support is large enough then I will be starting a Kickstarter campaign in the coming days to raise money to buy the USLS a Nissan GTR. The powerful 4WD sports saloon will be required to carry USLS documents to standard library implementors in a timely manner.

17 comments:

Unknown said...

Does the foundation have any job openings for delivery drivers? If so I'd like to apply

Kang Seonghoon said...

Interesting, I have thought the virtually same thing two years ago. My argument is actually slightly stronger (not only considers naming but also semantics), and uses the design of date/time interface, which involves the following design decisions:

- Should we separate a date from a time or not?
- Should we separate a time interval from a time instant or not? (ISO 8601 separates them.)
- Should the date/time object mutable or not? (JavaScript Date object has very bad design in this aspect.)
- How much accuracy is required? Seconds? Milliseconds? Microseconds?
- What is the range of a date object? How should we report the out-of-range error if any? (e.g. exception, or a special object like NaN)
- Should we support a leap second? If we support a leap second what would be the semantics of date arithmetics involving a leap second?
- Should we support a date prior to the adoption of Gregorian calendar?
- How should we implement a timezone information? What about daylight saving time? Should we support a timezone-less time object (known as a "naive" time in Python, for example)?
- What timezones should we require the implementations to support? Is the UTC enough?

I have even sketched a language-independent library design framework, which can be adopted to many existing languages while keeping in contact with the existing convention of the languages. I think, though, it is very hard to do it perfectly and my design never went past the mind-storming stage.

For what it's worth, my initial attempt is available in: http://noe.mearie.org/remaster/
The original blog post is also available, but it is not in English so you have to trust your favorite machine translator ;). Anyway it is available in: http://j.mearie.org/post/2404538394/language-independent-library-design

Kang Seonghoon said...
This comment has been removed by the author.
Josh said...

Having a single name for each function is a good idea, but I think that the standard should allow for each language to transform that name to match the prevailing naming style. (I know you mentioned allowed characters, but this goes beyond that.)

So if you settled on "string to upper" as the human-readable version of the function name, the standard should allow the following variations:

string_to_upper
string-to-upper
stringToUpper
StringToUpper

And let's not forget the OO fetishists:

string.to_upper
String.toUpper
"string" :toUpper
etc.

(Speaking of OO, are we allowed to drop the "string" part of the function name if the function operates on objects of type string?)

As long as the function name reads the same, I don't think that the format matters. Being able to follow existing naming conventions will be very important if you want people to actually adopt this.

daGrevis said...

I support you 100%!

Neil Mitchell said...

I just port the Haskell standard libraries to every platform - works well enough for me.

Erik said...

@Neil Mitchell: Haskell doesn't even have a function to make a string uppercase. It only has `map toUpper`...

haywire said...

I'm on board as long as everything is exactly the same as how it is in Python.

ferruccio said...

I'm on board as long as everything is exactly the same as how it is in C++.

Unknown said...

I am on board as long as you use the ANSI C name which is erm .... erm ... there is no such function :) You write a loop yourself calling toupper.

Doctor Lard said...

I am on board as long as we do everything in D.

Brian said...

Logo is not completely standardized; UCB Logo uses uppercase. Most modern Logos don't use turtle graphics for drawing text fonts anymore; they would too often crash together on the 'X's resulting in too many injuries.

Unknown said...

There alread is an ISO standard and a Universal Library for nearly all of these languages.
It's .Net Framework. It's main libraries are ISO standardized and nearly all the languages that yo've mentioned have .Net implementations: (C#, VB, C++, F#, Java, Perl, Lisp, Scheme, PHP, Python, Ruby, Delphi, JavaScript, LUA, Ada) or bindings (Objective-C and all above).

BTW, did you know that you can use Python/Ruby instead of JavaScript for HTML scripting is the user has Silverlight installed?

name for names sake said...

This Seems to be a problem with Selfishness of Language Developers.

Maybe its one of the reason why SQL is still so common today.

The Lisp language developers seem to be the best at conforming but that has also impacted its fragmentation.

maybe in the future programming languages will be so highly abstracted that we will have menus for translating code into other languages.

Michael Kopinsky said...
This comment has been removed by the author.
Michael Kopinsky said...

<obligatory> http://xkcd.com/927/</obligatory>

Chad Brewbaker said...

It is annoying that functions as fundamental as prefix sum, all-nearest-smaller value, and suffix array are not standardized.

My two cents:

1) Read Stepanov's book Elements of Programming.

2) Abstract fundamental algorithms like sort to a simple interface. It should run just as easily on a 1000 machine AWS cluster as on a single core processor. Assume some algorithms will use parallelism in their implementation.

Also, think about creating an order of preference for implementers.

sort(a), sort(a), object.sort(a), (sort a),

Also make a standardized set of prefixes like "USL_" to prevent clobbering existing libraries when needed.

3) You won't find consensus on regex, date, and geocoding. Make these optional.