Just stumbled over this question on Stackoverflow.com.
What is the difference between nil? and == nil?.
There is no difference.
At least when looking at the observable outcome. And I prefer nil? because of readability. But that is just a matter of taste.
However there is a slight difference in how this outcome is calculated.
nil?
nil? is a method defined on Object and NilClass.
Unless you mess with this implementation through monkeypatching a nil? check is a simple method call.
For your own objects (unless they inherit from BasicObject which does not implement nil?) the implementation stems from
Object and will always return false.
== nil
a == b is just syntactic sugar for sending the == message to the left hand side, passing the right hand side as sole argument. This translates to
a.==(b) in the general case. And to a.==(nil) in our specific case.
At the Object level, == returns true only if obj and other are the same object.
So this returns true if two variables are pointing to the same object or if the receiving class has overridden the == method in another way. And subclasses are meant to override == to implement class specific behavior.
This also means that the performance depends on the implementation of ==. Unlike nil? which should not be overridden by subclasses.
Another solution
In ruby everything besides nil and false is considered truthy. If this interpretation fits your use case, then you can avoid the check and pass the object to the if clause directly:
Performance
I’d expect nil? to be as fast as == nil. And because == might be overridden by subclasses performance should depend upon the receiver.
And omitting the check should be fastest.
Here is a simple benchmark to test my assumptions. As usual, I use the wonderful benchmark/ips gem by Evan Phoenix.
I did not expect nil? to be slower, still looking into this. But one can see that if you go for the == check, then it should be faster if you do nil == other instead of other == nil. Usual micro benchmark warnings apply.
Consider a Rails application with a search feature. Type in a name and it lists the matching Artists. Sounds simple.
What if we also want to search for the city the artists were born in? Simple as well: add a join, include the city name in the where clause. Done.
But what if we not only want to search for the Artists but also the Albums? And have those included in the result list as well?
Usually I try to go without additional libraries or even additional services whenever possible. Because every dependency comes at a cost. And honestly there are quite a few dependencies in Rails already. So let’s try and see how far we can get without any additional gem.
The models:
Somewhat contrived but you should get the idea.
The simple solution
Just run two queries. As simple as it can get. Perhaps wrap it in an object so your controller stays clean and the view as well:
It works but starts to get complicated as soon as you have pagination or you want/need to display the results in the same list. How do you merge those results?
The DB-view solution
Another solution that does not need any external dependencies: Database views. Database views can be seen as a predefined select statement that is accessible like a table. Depending on your database you can use different view types (materialized) but for this sample I want to keep it simple.
We create a database view which acts as a reverse index. It combines all the attributes we want to be part of the search and add a reference back to the model. To have multiple models included in the view we can use union which combines results from multiple tables.
This is the migration which will create the view:
Now we need to switch the schema dump format because raw SQL statements are not reflected in schema.rb
Add following line to your application.rb
and then you’ll have a structure.sql instead of schema.rb after you run the migrations
Multiple things to notice:
This is for SQLite, some functions might be different for other databases (string concatenation, max)
Because an Artist can have multiple Nicknames we need to group the results. In order to get all the nicknames in our reverse_index column we use GROUP_CONCAT
I added a label column to avoid N+1 selects when displaying the results
The searchable_id and searchable_type column are named like this to make use of Rails polymorphic belongs_to association
The different selects need to return tables of the same size/column order
There is an updated_at column. I’ve added it to have a value I can use for ordering
With this table we can create a Search model and use it to search inside all Artist and Album records.
That’s it. You can now use it like:
Some notes about this solution:
The view does not have an id column, default ordering will not work
Weighting attributes is not possible. Weighting can be used to improve the order of hits. Consider this example: when searching for “john” then a match on the name “John Doe” should be ranked higher than on a company name “Johnson & peterson”
Performance: union can be costly, consider using union all
The takeaway
Depending on your needs there might a simple solution that does not depend on additional libraries and does not add a dependency. Don’t be afraid of SQL.
A simple question but hard to answer. Why is that hard? Because Ruby has various ways of defining a method and add it to a class:
Adding it to the singleton class
Adding it to the class
Include a module
Prepend a module
Extend a module
Inherit from superclass
If this sounds complicated to you then that’s because it is.
So first rule: try to avoid such situations where you have a multitude of classes and modules defining the same method.
If you have more than two definitions of a method then you most likely have bigger problems than knowing about the lookup path.
Also I haven’t seen many good uses of adding a method to the singleton class so far.
So how do we go about finding the lookup path? How about a small piece of code that answers this question?
What does this code do? It defines a method call for the six possibilities described above. They all print out some debugging info and then forward the call to super. Since at the end of the hierarchy the call method is not implemented I added rescue nil. Of course this would only be required for the last element in the hierarchy. But we don’t know which one this is, yet. Lets run the code and see the output:
What if you extend or include or prepend multiple times? The last definition comes first. That is if you have:
then the definitions from Baz will take precedence.
And of course if you do not call super then none of the other implementations will be called.
So, now that this is solved…let’s look at another way this can be determined: ancestors. The documentation says that this “Returns a list of modules included in mod (including mod itself).”.
If we extend above code to print the list of ancestors:
Then we can see following:
This is the order that we determined before but not complete. We are missing the methods that have been added to the singleton class. Those can be seen if we check the singleton_class
instead (note that this will create the singleton class if it does not yet exist):
This will print the full list of ancestors:
The #<Class:#<Klass:0x007fe34b225480>> is the singleton class. It exists solely for this object:
This ancestry also shows how Ruby looks up methods. It does not make complicate decisions of where to look first. It just walks up the hierarchy and calls the first matching method it can find. So if the singleton class does not respond to the method, then the prepended modules will be checked until the root is reached.
In Ruby there are several objects that respond to call. I usually refer to them as callables:
Proc
Lambda
Method
There are various ways that you can invoke those callables:
.call()
[]
.()
.===
I’ve talked about the case equality operator === over here.
But what about the other ones? IMHO you should stick to .call() because it does not require knowledge about
a special syntax.
Recently we had to deal with importing about 120Mb of XML data. Daily. It was split up into files of around 6Mb each.
Those files contain information about employees, with around 60 attributes per employee and around 5000 employees per file.
We need to read the files and create/update a record in the DB with the values from the XML file.
An XML file looked something like this:
Basically a CSV file in XML format…
Processing these files took ~16h. Whaaat?
I don’t need to mention, that there was a bug in our code, do I (we accessed the nodes through NodeSet, see benchmark samples)? By fixing it we were able to cut down the time to about 20min. Not bad.
Looked further into the code with rubyprof, I found that still a big part of time was spent in XML parsing. So I set out to build a benchmark for our use case. I was wondering if we could improve performance by replacing Nokogiri with one of the alternatives:
As Mike Perham described in his Kill Your Dependencies article, relying on
fewer libraries (and using STDLIB instead) is better. Since Rails has a dependency I always resort to Nokogiri for XML parsing. Even though REXML works and in cases where performance does not matter could be used instead of adding another dependency to your lib.
A simplified benchmark can be found here. I was suprised that OGA (pure ruby) was that much faster than Nokogiri. OX tops that and is ~12 times faster than Nokogiri.
So it looks like I could improve the performance once more by switching to OGA (preferred, since pure ruby) or OX. Next step is to test the performance in real life with real input under real conditions.
Note: It looks like OX does return nil for empty bodies whereas Nokogiri returns empty string.