Wednesday, September 02, 2009

What will the next generation Operating System look like?

A few days ago I had a chat with my friends.  The topic goes to the latest trend of operating system(OS) evolution.  Considering web base applications are getting more and more powerful.  Casual users would rely on more web based application in the future, which means they will need to install less offline applications in their own hard disk.  The development of small size, relatively low end laptop computers (or netbook) is a good indication of this phenomena.  With this trend, I have made my bold prediction on the future of OS we would see in 5 to 10 years time.  You may or may not disagree my point, all I want is starting a discussion.

Web Browser Everywhere
More than 10 years ago, when someone saw a first generation web browser, like Mosaic.  He/she might not imagine how many functions it and the Internet can provide.  Now, we send email, edit documents, watch video, and play games with just a web browser in our own computer.  We will see that web browser would be a more advanced "Shell" in our operating system (like "cmd" in Windows and ksh in Unix), since every OS need a shell process to run applications.  Even better, web browsers on different OS all agree a bunch of open standard like HTML and Javascript.  That means such architecture brings another level of "write once, run anywhere" to application developers.  As the web browsers continuously evolve, we will have even more powerful and stable web based application.

So will the future OS really just a web browser with nothing else?  User will really no need to install any software locally? No, but things will work in a different way.

All Applications are Web Based
You may love to edit your documents on Google Docs, but you may not like to put any copy of your document onto the Internet due to some kind of security reason.  Someday you may be able to install a local version of Google Docs in your own computer.  Your OS will come with an application server that supports most commonly used web application architecture like JavaEE, PHP, and maybe .NET.  Installing a new software in you computer would actually installing a new web application package to the application server in your OS.  So you will still use the application with a Web Browser.  With a very similar look and feel as you use it online, but in fact it only hook up with your local application server.  Applications that require a lot of runtime resource will be deployed in this way.

Internet as Storage
As the bandwidth to the Internet becomes cheaper.  We'll found that buying storage on the Internet would be more cost effective than buying local storage for our own computer.  When uploading and downloading files to Internet storage provider would be fast enough at some point in the future.  The file explorer in our OS will be able to access Internet storage provider seamlessly.  Users can use them just like a local drive.  So when users want to have more storage space.  They will not buy new local hard disk, but pay to the storage provider for extra space.  The storage provider, will bear the responsibility on the maintenance of the storage infrastructure, backup of the users' data, and keep the data as secure as possible.  So, there will be no point to keep your files in local hard disk unless it is very confidential.

For example, today if you buy a Eee PC from Asus.  They provide 2Gb of Internet storage space to the buyer.  The file explorer of Eee PC can mount the storage space as a drive.  So managing files on it will be just like the files are in local hard disk (provided that your Internet connection is fast enough).

Local Database for Application
Almost all complicated application need a database of some sort.  Since the future OS will still contain local web based applications.  They should also provide some kind of local database engine, like mySQL, or file based database like Derby.  When a user install a local application, the installation program will create a new database locally if necessary.

Conclusion
At this moment, I think the nearest thing I can find on the market is Linux.  Maybe in a few years time a new Linux distribution will have some of the features I predicted.  The new OS from Goggle would be another hope since such kind of OS will be a perfect match to their current service.  What do you think?

Wednesday, July 22, 2009

Consideration of running batch programs without Stored Procedures

A few days ago, I've joined a discussion with one of my company's customer about software application architecture. Once again, the good old debate about using stored procedure(SP) was brought onto the table.

The customer, which used stored procedure for database operation in her legacy systems quite intensively, seems giving in this time. They agreed that for online transactions, DB operations can be implemented in Java based Data Persistence Layer(DPL), like DAO or Entity EJB. However, for batch programs, we proposed Stored Procedures can still be use. The customers seems happy. After all, this design looks nature, low risk, and DB programmers hired by the customer won't lost their job.

Running batch programs with SP seems a very nature decision. Batch programs usually involve large amount of data processing in database. And usually the time window for a batch program to run is very limited. Many application systems can only allocate 8 hours for running dozens of batch job everyday. So performance is also a key issue. SP usually is the fastest answer for this type of requirement.

Actually, from my experience, SP may not always help. In my career I've saw a weekly batch program with thousands of SQL scripts took more than 24 hours to finish. The problem is, no matter how fast SP can be execute, resource available for SP programs is only CPU and memory of one DB server.

The believers of Object Oriented Programming continuously have been selling us to put all persistent logic in DPL. In spite of the performance issue, using SP to update data in a batch process still defeat the whole propose of using DPL. After some time, the old data integrity issue will still emerge, as if no DPL is being used. So, if you think the hybrid solution I proposed to my client is the ultimate answer, you are in fact fooling yourself.

So, I re-think again and again, how a batch program can be implement with Java while all requirements, mainly on performance can be met? I have some answer below. If you are OOP people, please read them and kindly share your through with me. If you are SP people, please let me know how can you maintenance a batch program written in SP that need more than 24 hours to run without database re-design.

Problem 1: Performance

As I mentioned above, performance is usually the biggest hurdle against implementing batch programs outside the database. Why this have to be slower? The answer is simple. In this way the system have to send the data from DB to another server thought the network. Transform the data to another structure (e.g. Java Objects). And write the result back to DB thought the network again. Comparing with using SP, all these are additional overhead. As we cannot completely avoid these overhead. There is one way to compensate these.

When you run a batch program with SP, you have to use the CPU in the DB server. No matter how expansive your server box is. All you can have are usually 2 or at most 4 CPUs. With this limited computation power, they have to perform all the operation including query, calculation, and update.

When you perform your batch program outside your DB server. You are actually moving your calculation to other CPUs. In this way, not only you can have more CPUs to do the job, as you can have different servers to do independent batch programs. Also, your DB server would have less work to do, so queries and updates will go faster. With proper design, I think the performance will at least comparable with the traditional SP approach.

Problem 2: Memory Management

Surprisingly, there's a factor that make batch process with program outside the DB difficult, or sometimes consider impossible. It is memory management.

It is very common for a DB contains many Gigabytes or even Terabytes of data. In SP, it is relatively easy to create a temporary data set to store transient data for calculation. DB usually can handle this without much difficulties. However, for a program outside DB, memory available are usually a few Gigabytes, or less than 2 Gb for a Java program. If your program have to perform sorting or grouping on a large amount of data. You may easily hit the Out of Memory error.

The answer to this problem, is saving the data in temporary files on the disk. For example, with Java, you may implement common Java data structure such as List or Map to serialize the content and save them in temp files. As such actions will be wrapped in the data structure class, data read/write from the temp files would be hide away from the business logic.

Conclusion

After the discussion above, I have to say migrating batch programs from SP to something outside DB require considerable effort, and it is not risk free. However, we should not under-estimate the benefit of isolating the data persistence logic in the long run. If your application have a complicated database scheme, data in the database would gradually turn into a chaos if you let programmers to update the data directly in their business logic, even only in batch programs. Then causing painful maintenance issues afterward.