Boiling Java Application Design

Wednesday, September 02, 2009

What will the next generation Operating System look like?

A few days ago I had a chat with my friends. The topic goes to the latest trend of operating system(OS) evolution. Considering web base applications are getting more and more powerful. Casual users would rely on more web based application in the future, which means they will need to install less offline applications in their own hard disk. The development of small size, relatively low end laptop computers (or netbook) is a good indication of this phenomena. With this trend, I have made my bold prediction on the future of OS we would see in 5 to 10 years time. You may or may not disagree my point, all I want is starting a discussion.

Web Browser Everywhere

More than 10 years ago, when someone saw a first generation web browser, like Mosaic. He/she might not imagine how many functions it and the Internet can provide. Now, we send email, edit documents, watch video, and play games with just a web browser in our own computer. We will see that web browser would be a more advanced "Shell" in our operating system (like "cmd" in Windows and ksh in Unix), since every OS need a shell process to run applications. Even better, web browsers on different OS all agree a bunch of open standard like HTML and Javascript. That means such architecture brings another level of "write once, run anywhere" to application developers. As the web browsers continuously evolve, we will have even more powerful and stable web based application.

So will the future OS really just a web browser with nothing else? User will really no need to install any software locally? No, but things will work in a different way.

All Applications are Web Based

You may love to edit your documents on Google Docs, but you may not like to put any copy of your document onto the Internet due to some kind of security reason. Someday you may be able to install a local version of Google Docs in your own computer. Your OS will come with an application server that supports most commonly used web application architecture like JavaEE, PHP, and maybe .NET. Installing a new software in you computer would actually installing a new web application package to the application server in your OS. So you will still use the application with a Web Browser. With a very similar look and feel as you use it online, but in fact it only hook up with your local application server. Applications that require a lot of runtime resource will be deployed in this way.

Internet as Storage

As the bandwidth to the Internet becomes cheaper. We'll found that buying storage on the Internet would be more cost effective than buying local storage for our own computer. When uploading and downloading files to Internet storage provider would be fast enough at some point in the future. The file explorer in our OS will be able to access Internet storage provider seamlessly. Users can use them just like a local drive. So when users want to have more storage space. They will not buy new local hard disk, but pay to the storage provider for extra space. The storage provider, will bear the responsibility on the maintenance of the storage infrastructure, backup of the users' data, and keep the data as secure as possible. So, there will be no point to keep your files in local hard disk unless it is very confidential.

For example, today if you buy a Eee PC from Asus. They provide 2Gb of Internet storage space to the buyer. The file explorer of Eee PC can mount the storage space as a drive. So managing files on it will be just like the files are in local hard disk (provided that your Internet connection is fast enough).

Local Database for Application

Almost all complicated application need a database of some sort. Since the future OS will still contain local web based applications. They should also provide some kind of local database engine, like mySQL, or file based database like Derby. When a user install a local application, the installation program will create a new database locally if necessary.

Conclusion
At this moment, I think the nearest thing I can find on the market is Linux. Maybe in a few years time a new Linux distribution will have some of the features I predicted. The new OS from Goggle would be another hope since such kind of OS will be a perfect match to their current service. What do you think?

Wednesday, July 22, 2009

Consideration of running batch programs without Stored Procedures

A few days ago, I've joined a discussion with one of my company's customer about software application architecture. Once again, the good old debate about using stored procedure(SP) was brought onto the table.

The customer, which used stored procedure for database operation in her legacy systems quite intensively, seems giving in this time. They agreed that for online transactions, DB operations can be implemented in Java based Data Persistence Layer(DPL), like DAO or Entity EJB. However, for batch programs, we proposed Stored Procedures can still be use. The customers seems happy. After all, this design looks nature, low risk, and DB programmers hired by the customer won't lost their job.

Running batch programs with SP seems a very nature decision. Batch programs usually involve large amount of data processing in database. And usually the time window for a batch program to run is very limited. Many application systems can only allocate 8 hours for running dozens of batch job everyday. So performance is also a key issue. SP usually is the fastest answer for this type of requirement.

Actually, from my experience, SP may not always help. In my career I've saw a weekly batch program with thousands of SQL scripts took more than 24 hours to finish. The problem is, no matter how fast SP can be execute, resource available for SP programs is only CPU and memory of one DB server.

The believers of Object Oriented Programming continuously have been selling us to put all persistent logic in DPL. In spite of the performance issue, using SP to update data in a batch process still defeat the whole propose of using DPL. After some time, the old data integrity issue will still emerge, as if no DPL is being used. So, if you think the hybrid solution I proposed to my client is the ultimate answer, you are in fact fooling yourself.

So, I re-think again and again, how a batch program can be implement with Java while all requirements, mainly on performance can be met? I have some answer below. If you are OOP people, please read them and kindly share your through with me. If you are SP people, please let me know how can you maintenance a batch program written in SP that need more than 24 hours to run without database re-design.

Problem 1: Performance

As I mentioned above, performance is usually the biggest hurdle against implementing batch programs outside the database. Why this have to be slower? The answer is simple. In this way the system have to send the data from DB to another server thought the network. Transform the data to another structure (e.g. Java Objects). And write the result back to DB thought the network again. Comparing with using SP, all these are additional overhead. As we cannot completely avoid these overhead. There is one way to compensate these.

When you run a batch program with SP, you have to use the CPU in the DB server. No matter how expansive your server box is. All you can have are usually 2 or at most 4 CPUs. With this limited computation power, they have to perform all the operation including query, calculation, and update.

When you perform your batch program outside your DB server. You are actually moving your calculation to other CPUs. In this way, not only you can have more CPUs to do the job, as you can have different servers to do independent batch programs. Also, your DB server would have less work to do, so queries and updates will go faster. With proper design, I think the performance will at least comparable with the traditional SP approach.

Problem 2: Memory Management

Surprisingly, there's a factor that make batch process with program outside the DB difficult, or sometimes consider impossible. It is memory management.

It is very common for a DB contains many Gigabytes or even Terabytes of data. In SP, it is relatively easy to create a temporary data set to store transient data for calculation. DB usually can handle this without much difficulties. However, for a program outside DB, memory available are usually a few Gigabytes, or less than 2 Gb for a Java program. If your program have to perform sorting or grouping on a large amount of data. You may easily hit the Out of Memory error.

The answer to this problem, is saving the data in temporary files on the disk. For example, with Java, you may implement common Java data structure such as List or Map to serialize the content and save them in temp files. As such actions will be wrapped in the data structure class, data read/write from the temp files would be hide away from the business logic.

Conclusion

After the discussion above, I have to say migrating batch programs from SP to something outside DB require considerable effort, and it is not risk free. However, we should not under-estimate the benefit of isolating the data persistence logic in the long run. If your application have a complicated database scheme, data in the database would gradually turn into a chaos if you let programmers to update the data directly in their business logic, even only in batch programs. Then causing painful maintenance issues afterward.

Monday, April 21, 2008

HtmlUnit - Java Headless Browser

HtmlUnit

Saturday, September 01, 2007

When to use inner class

This is a question from a friend of mind.

In a class, we use private fields and private methods to hide, or encapsulate logic from outside. However, sometimes, we need a full functioning class within another class, but the class inside should not in anyway accessible or even exist outside the bigger class. This is the situation we need a inner class.

For example, inside a watch, there are many gears. As a user of a watch, you never care how the gears work within the watch. You even don't care if a watch is work on gears or not (it maybe a electronic watch). In this case, gear could be a inner class in a watch.

Private inner class provides better shielding of complicated logic within a class. The most common case of using an inner class is implementing Listeners inside another class. You just use an anonymous inner class that implements a specific Listener interface.

Although you can also define an inner class as a public class, but usually I do not recommend programmers to do this. Because you can always extract public inner class as an ordinary public class. I think it make no sense to use a public inner class unless you want to do something dirty.

Saturday, August 25, 2007

Rethink Java and Object Oriented Software Engineering

I worked as a software developer for over 10 years. My job gives me many opportunity to lead and mentoring developers with very few or no experience on Object Oriented Software Design and Java programming. They usually ask me many questions. Some of these questions are very conceptional and not every experienced Java programmers can answer them well. If you want to find an answer in books, most books are too practical so they can just tell you how-to but seldom tell you why.

Most of them send me those questions with Instance Messenger . It is very difficult answer them with IM since it usually requires hundred of words to explain my idea. It is also a nightmare to send source code with IM. That's why I decided to answer those Java and OO Design FAQ with my blog.

As the very first chapter of the series. I'm going to recap the brief history of Java, and, as one of the most accepted language for OO Software Development, some characteristic of the language that make it so popular. If you have been using Java for a long time. You may not aware of these features have been helping you a lot. If you are programmer that have only use other computer languages like VB or PL/SQL. Please read on and you'll know why so many programmers shifted to use Java for software development.

Before the debut of Java, OO software engineering had been a hot topic. Like many new technologies, at first it was discussed in the academic circle. Then many research work was done in universities. There are quite a number of OO computer languages was born in 60s and 70s of last century. One of them was Modula 3, it have been widely used in OO software design classes in universities, but now it is widely forgetten except in university campus. Later on, Stroustrup, a Bell Labs worker, added OO feature to the C language, which was one of the most popular computer language in that age. C++ appeared in 1985, rode on the similarity with the C language, C++ gained much popularity in a decade. Finally, Microsoft also adapted C++ as a language for developing applications on Windows platform, which is the well known Visual C++.

As C++ jumped from the academic to the business world. People found that the OO feature is too difficult for junior programmers to learn. Althought it is very extensible like the C language. There is no API standardization under C++. Making C++ code not portable between different platform. There are also some disadvantage that are inherited from the C language.

If you have not used C++ before, it is hard to imagine what the problem is. C++ is like Java with following difference:

Supports class multiple inheritance, you can have a class extends from many classes.
Don't have Interface
Can access memory directly with pointers, some how you must use a pointer in many cases when you use an object.
No auto garbage collection (if you don't know what is GC, you are spoiled by Java)
Cannot "Write once, run anywhere", not even "Write once, compile anywhere"

If you know Java, you should have a rough idea now.

In early 1990s, James Gosling (known as the father of Java, YES!!) from Sun Microsystem, invented the Java language. It looks like C++ a lot but there are also many improvement like:

Simplified object inheritance
No custom operator
No pointers, so object is object everywhere in you program, like Visual Basic
Auto garbage collection, no more memory leak issue
Have its own standard API library, for many common operation like I/O, String manipulation, date manipulation, networking I/O and database access.
Last but not least, the mighty "Write once, run anywhere".

As a computer language, Java is much easier to learn and use than C++. Over a decade, Java continuously evolved to catch up the growth of the industry and the demand of Java programmers. That's what brings Java to the current place in the industry.

So much for the history, now we have a good OO language. However, before Java get the big piece of pie in software development industry. There were many other tools exist like PL/SQL, Power Builder, and Visual Basic. All of them have been very popular. They are not OO. Why OO is so important? Why OO design make software development easier. Many books about OO give you the answer, but most of them are very difficult to understand. Now I try to make it easier with an example.

Imagine you go to a restaurant, you sit down and a waiter comes to you. You give him your order. He takes your order, writes it on a piece of paper. Then he goes to the cashier, leaves the paper there for recording. Then the waiter himself walks into the kitchen and starts to cook. After a while your dishes are ready and the waiter brings them to your table from the kitchen.

You finish your meal, you ask a waiter to give you your bill. The waiter goes to the cashier, searches for the paper that recorded all your orders. Then he calculates the amount your need to pay. Then he comes back to your table with your bill.

Consider this restaurant is a software system. All "waiters" (actually they are also cook and cashier) are programmed by procedure based programming language like C language or PL/SQL. In this kind of system every program must works from end to end. Every program must handle every task that lies on its path. It is very difficult to make changes to this kind of system. If you want to change this restaurant from serving French dishes to Chinese ones. You need to fire all "waiters" and re-employ a new team.

Usually, we won't see a restaurant operate in this way because our world is a world of objects. Waiters, cooks and cashiers are different party but work together. There is separation of duties, which means a waiter don't need to know how to cook, and a cook don't need to know how to calculate the bill. In OO design we call this "separation of concern", which is the most fundamental idea of OO Software Engineering. A software system should be build with a group of objects that are inter-related. This is what we call a "object model". All features that a OO computer language provides can help the developers to realize the object model easier.

If you understand this idea, the new question waiting for you is: "How to build a object model correctly?" This is not a easy question, thanks to many gurus who do a lot of work of OO design. Now we have some loose guide lines called "Design Pattern". Those are answers to many common problems that we'll face in OO design. I'll talk more about them in my later articles.

Friday, March 23, 2007

Why Free Software Sometimes Hurts

I've been assigned to evaluate a free software, which is a PHP package. I downloaded the package and read through a very brief "readme.first" document. Which indicated that it requires Apache, PHP, and mySQL. Though I never installed PHP before. I thought such combo is a very common and popular combination and I didn't expect too much twist and turn.

First I installed Apache 2.2 for Windows. I had done that thousands times in my life. Then I downloaded and installed PHP5 for Windows. The installer provides automated config for Apache. Pretty!! Finally, I installed mySQL and got it up and running right away.

Everything was ready. I copied the PHP package under the web root and tried to start it. Dada!! Failed. An error message was found in the error log which looks like a program bug in the PHP package. I went through the manual of PHP and asked someone who know PHP better than me. Finally after one day of struggling, I got it working. In fact it could be much simpler. Let see why it took me so long.

1. The PHP5 installer do insert some lines to httpd.conf to load the PHP Apache module, but it forgot an "AddType" statement. I have to add it myself after I found that it is necessary, consulting the installation document.

2. Most PHP programmers use "<? ... ?>" to surround PHP code in the PHP files. While some other PHP tutorial suggest it can also be done like "<?php ... ?>". All files in the PHP package I was evaluating use the pattern "<? ... ?>". However, the default setting of PHP enforce us to use "<?php ... ?>". There is a tiny switch in the config of PHP to enable the use of "<? ... ?>". I need to turn on the switch to make the package work. Hey, why the switch is not turned on by default? Seems to me all PHP programmers prefer the shorter form.

3. Third issue is the connection with mySQL. After more reading of PHP manual, I found that the mySQL connection component is no longer included in PHP5 by default. It turns out that I needed to do 3 things for mySQL connection. First one is I needed to change the "PATH" environment string to let the system found the file "libmysql.dll", which is provided by mySQL. Second is I needed to re-run PHP5 installer to add the mySQL extension for PHP. Finally I modified PHP config for loading the mySQL extension into memory. Why can't they better document these procedures?

Sometimes free software may cost your valuable time in a unexpected way. For some other software which you need to pay, you can call technical support and get your problem solved right away (well, sometimes it may take longer). Luckily I'm never a great fan of PHP.

Friday, February 16, 2007

Under the hood of J2EE Clustering

A very nice article about Clustering in J2EE server. Worth a read.

Click here to read.

Saturday, September 16, 2006

Stored Procedure, use or not?

Today I found a discussion about using Stored Procedure for all database call in a software application in here.
The debate have been continued for about 2 years. This is one of the typical software design question that have no absolute yes or no answer. When dealing with this kind of question, developers usually analyze the question in terms of pros and cons, but if we go back and consider the non-function requirement first. It will be easier for us to figure out which solution is better in all situations.
First we consolidate all non-function requirement here, definitely most of them can be achieve with or without using stored procedure, but usually for certain requirement, one solution is better than another. So after we build the full list, we'll have a rule set that can help us to determine we should or should not use stored procedure in our own scenario.

Performance
Usually this is the strongest reason for using stored procedure. Without doubt SP can make database operation faster. However it is also a typical remedy to poorly written queries and poorly designed schema. If you believe in the good old 80-20 rule, doing all DB operation in SP is a waste of development effort. With modern profiling tools it is easy to separate slow queries from others. So you just need to tune those queries that really matters. Or wrap them up in SP. On this point, I think a mixed approach is more suitable.
Database Portability
Stored procedures are never portable, but such portability is not always required. Many enterprise use one single RDBMS product and never change. Usually, under two scenario this requirement is necessary:
1. You application is a product and your end-users can use it on top of different DB
2. plan to use a different DB with your application in the future.
Security
This is usually necessary if you have more than one applications accessing the same DB, so you may just grant certain application to access SP that it really needs, and hiding the entire schema. However, that means you are using the DB as a point of application integration. It may be the only way before we have other application integration technology like MQ or web service. If you have only one application accessing one DB, and doing the integration outside DB. Why you need SP for security?
Service Interface
SP can be treat as interface of service for your application, but now web service is a better way to do so.
Unit Testing
No matter you use SP, or data access layer with plain SQL, you have lots of ways to do unit test. So no difference.
SP as business logic layer
PL/SQL and T/SQL is not a good language to implement complicated business, better do it in business service layer.
SP as a layer to maintain data integrity
SP may be a more nature place for maintaining data integrity. Though there is no big different if you do it in the data access layer.
Stop the ripple effect of DB change
Both SP and data access layer can do the same job.

Finally I want to point out another down side for accessing to DB with SP only. Such practice will easily direct developers to produce a too database centric design. And put too much loading to the DB server. As a matter of fact DB usually is a component that is most difficult to scale up. In contrast, the mid-tier can be scale up easily with modern load balance technology. No matter we use SP or not, we can do some trick like cache to reduce the number of DB access. That can also enhance the performance.