Performance Q&A
How does Hibernate perform?We claim that Hibernate performs well, in the sense that its performance is limited by the underlying JDBC driver / relational database combination. (Generally the overhead is much less than 10% of the JDBC calls.) You should be able to convince yourself of this by comparing performance between a "slow" database like Postgres and a "fast" one like HypersonicSQL. Given this, the question boils down to: does Hibernate implement its functionality using a minimal number of database queries? This is a more difficult question to answer (see below).
But how is that possible, if Hibernate uses so much runtime reflection?Many former C or C++ programmers prefer generated-code solutions to runtime reflection. This is usually justified by reference to the performance red-herring. However, modern JVMs implement reflection extremely efficiently and the overhead is minimal compared to the cost of disk access or IPC. Developers from other traditions (eg. Smalltalk) have always relied upon reflection to do things that C/C++ needs code-generation for. In the very latest versions of Hibernate, "reflection" is optimised via the CGLIB runtime bytecode generation library. This means that "reflected" property get / set calls no longer carry the overhead of the Java reflection API and are actually just normal method calls. This results in a (very) small performance gain.
Okay, so what are the advantages of reflection then? A quicker compile-build-test cycle. The advantage of this should not be understated. The philosophy of Hibernate is this: let the developer spend as little time as possible implementing persistence for the 95% of the application which is used 5% of the time. Then, later, if there are performance issues with the remaining 5%, there will be plenty of time left for hand-coding JDBC calls to improve performance of particular bottlenecks. (Most of the time Hibernate very closely approaches the performance of hand-coded JDBC anyway.)
But how does it scale?Hibernate implements an extremely high-concurrency architecture with no resource-contention issues (apart from the obvious - contention for access to the database). This architecture scales extremely well as concurrency increases in a cluster or on a single machine. A more difficult question is how efficiently Hibernate utilizes memory under heavy load. Since there is no sharing of objects between concurrent threads (like EJB), and since Hibernate does not automatically do instance-pooling (unlike EJB), you might think that memory utilization would be less efficient than some other solutions like EJB and JDO. However, our experience with real Java applications is that the benefits of instance-pooling are almost negated by common Java coding style. Very often programmers create a new HashMap in ejbLoad ..... or return a new Integer from a method call .... or do some string manipulations. Furthermore, every time you load and then passivate bean, every non-primitive field of the bean is garbage, not to mention whatever garbage the JDBC driver leaves behind. All these kinds of operations leave behind as much garbage as we avoided by doing instance-pooling. Futhermore, In my testing using JProbe, I have found that JDBC drivers produce one to two orders of magnitude more garbage than Hibernate itself! So I'm not losing sleep over this. Please note that Hibernate is not a competitor to EJB. You can use Hibernate as a bean managed persistence mechanism. Alternatively, you can use vanilla Hibernate objects from inside a session bean (our recommended architecture). The other side of scalability is downward scalability. While it wasn't designed with small devices in mind, Hibernate nevertheless has a small footprint and could be used on machines with much less memory than you would need to run an application server. If it can run a JVM and a database, it should be able to run Hibernate.
Why not implement instance-pooling anyway? Firstly, it would be pointless. There is a lower bound to the amount of garbage Hibernate creates every time it loads or updates and object - the garbage created by getting or setting the object's properties using reflection. More importantly, the disadvantage of instance-pooling is developers who forget to reinitialize fields each time an instance is reused. We have seen very subtle bugs in EJBs that don't reinitialize all fields in ejbCreate. On the other hand, if there is a particular application object that is extremely expensive to create, you can easily implement your own instance pool for that class and use the version of Session.load() that takes a class instance. Just remember to return the objects to the pool every time you close the session.
So DOES Hibernate implement its functionality using a minimal number of database queries? Good Question. There's one occasion where Hibernate issues more queries than you would probably use if you coded the functionality by hand: mass updates or mass deletes. For this kind of operation, it makes sense to resort to hand-coded SQL. On the other hand, Hibernate can make certain optimizations:
- Caching objects. The session is a transaction-level cache of persistent objects. You may also enable a JVM-level JCS cache to memory and / or local disk.
- Executing SQL statements later, when needed. The session never issues an INSERT or UPDATE until it is actually needed. So if an exception occurs and you need to abort the transaction, some statements will never actually be issued.
- Never updating unmodified objects. It is very common in hand-coded JDBC to see the persistent state of an object updated, just in case it changed.....for example, the user pressed the save button but may not have edited any fields. Hibernate always knows if an object's state actually changed.
- Efficient Collection Handling. Likewise, Hibernate only ever inserts / updates / deletes collection rows that actually changed.
- Rolling two updates into one. As a corollary to (1) and(3), Hibernate can roll two seemingly unrelated updates of the same object into one UPDATE statement.
- Updating only the modified columns. Hibernate knows exactly which columns need updating and, if you choose, will update only those columns.
- Outer join fetching. Hibernate implements a very efficient outerjoin fetching algorithm!
- Lazy collection initialization.
- Lazy object initialization. via CGLIB proxies!
Heres a few more (optional) features of Hibernate that your handcoded JDBC may or may not currently benefit from
- efficient PreparedStatement caching (Hibernate always uses PreparedStatement for calls to the database)
- JDBC 2 style batch updates
- Pluggable connection pooling
Hopefully you will agree that Hibernate approaches the parsimony of the best hand-coded JDBC object persistence. As a subscript I would add that I have rarely seen JDBC code that approaches the efficiency of the "best possible" code. By contrast it is very easy to write efficient data-access code using Hibernate.
Conclusion?It turns out that Hibernate is very fast. You are welcome to verify this for yourself.
|