Tuesday, December 2, 2008

MySQL Data and Index Sizes

Here is a very useful mysql query that I learned today:

SELECT TABLE_NAME,
TABLE_ROWS,
DATA_LENGTH / 1024 / 1024 / 1024 AS DATA_LENGTH,
INDEX_LENGTH / 1024 / 1024 / 1024 AS INDEX_LENGTH
FROM information_schema.TABLES
WHERE table_schema = 'SCHEMA NAME';

Replace the 'SCHEMA NAME' constant with the name of your database schema, and it will tell you the current size of your tables and the indexes on those tables.

The table we were researching had half a billion rows, with 31 gigs of data, and 63 gigs of indexes.

Wednesday, January 9, 2008

What you should know about GC, but probably don't

Chaotic Java has an outstanding review of the JVM garbage collectors in two parts. The first is an introduction to mark and sweep garbage collections, and the second is a quick, mostly understandable, review of concurrent garbage collection.

In my experience in working with .Net and Java, I have found that most developers have little-to-no knowledge of the theories behind garbage collection and reference strength. I fear for the future of development if more developers don't understand the tools of their trade in greater detail. Who knows what products they might be working on, or how acceptable a restart every four minutes will be.

Monday, January 7, 2008

Coconut AIO collects dust.

Coconut AIO is part of the Coconut project at Codehaus. The focus of the coconut project is highly concurrent internet services. This includes things like caching, multi-threaded IO, etc. Sadly, the AIO component of Coconut is no longer developed, and is considered discontinued. Even the Coconut website no longer references the Coconut AIO package.

One of the things I liked about Coconut AIO was that it was based around java.util.concurrent.Future objects. This provided more accessibility to junior developers since the whole model behind selector threads, content filters, and registering socket interests can be rather daunting.

Maybe the Coconut group will revisit the AIO package after they finish work on their caching services.

Wednesday, December 19, 2007

Java NIO and The Grizzly

I have been working on a small side project to develop a highly scalable reporting and analysis service. Part of the design calls for all "processing" nodes to maintain persistent connections to every other "processing" node in the cloud. I knew the idea of using blocking IO and the 1:1 thread/connection model was going to be horrible for this design.

I first turned to the Java NIO packages. I always start with trying to understand the underlying technology before I start looking into libraries. While I wouldn't say that the Java NIO packages are exceptionally difficult to work with, they leave a lot to be desired in the documentation department. Even the examples and tutorials that exist on the internet appear incomplete. Very few touch on the best methods for dealing with write operations. Between writes, and fully understanding how to properly iterate over my selected key set (you have to call iterator.remove() after calling iterator.next()), I spent a week trying to get a firm grasp on what was really going on. By the end of the second week, I had created a prototype java application that listened on sockets and was playing hot potato with a serializable java object. I had acquired my basic understanding. Now ready for looking into libraries.

Grizzly is the library I am currently looking at. Getting a simple echo service up and running in Grizzly is a no brainer. The concept can be completed in under 20 lines of code. Grizzly even comes with protocol parsers and other useful interfaces that make developing your own protocol directly on-top of TCP or UDP a straightforward exercise.  I haven't gotten to far down the path of implementation yet, but I will definately be using Grizzly rather than rolling my own NIO solution.

The code for the Grizzly version of a simple echoing server is below:
public static void main(String[] args) 
throws IOException {
Controller controller = new Controller();
TCPSelectorHandler handler =
new TCPSelectorHandler();
handler.setPort(9090);
controller.setProtocolChainInstanceHandler(
new DefaultProtocolChainInstanceHandler() {
public ProtocolChain poll() {
ProtocolChain chain = protocolChains.poll();
if (chain == null) {
chain = new DefaultProtocolChain();
chain.addFilter(new ReadFilter());
chain.addFilter(new EchoFilter());
}
return chain;
}
}
);
controller.addSelectorHandler(handler);
controller.start();
}

I also researched the following libraries: EmberIO (part of Pyrasun), Apache Mina, and Coconut AIO. I will be posting some of my experiences with these libraries later.

Saturday, May 19, 2007

A small side project can be just the motivation you need.

Some of us entered programming because we have a passion for it. Work tends to smother that passion, but a small little utility app or a personal project is all it takes to remind us of the enjoyment this profession can bring.

A personal example is some recent burn-out I was suffering. Work was getting to me, and I was tired of my main project; I will go so far as dreading it. A friend asked me to write a simple app. Just something to take a downloaded OFX file and modify some fields to match how he would prefer to track his accounting. I managed to write the program and deliver the first build to him in under an hour. We spent the next hour trying to figure out why Quicken refused to import the modified OFX. When everything was fixed in the third hour, the end user declared it a stunning success.

It renewed my interest in programming. I felt success and it was good. The next day I went into work, ready to make the larger project a stunning success. Of course, that was the same day all of AOL's servers went up in flames.

Saturday, May 12, 2007

VMWare and FreeBSD

My biggest problem with running an OS under virtualization is clock drift. The default settings for nearly every OS I install has some form of clock drift. I have no idea if this will help anyone, but here are my settings for FreeBSD 6.2+ under VMware.

First, I always rebuild my kernel. There are two Lance drivers. The lnc and the le. The le driver is newer and has better considerations for locking. I comment out the slower, more trusted lnc device and replace it with the newer one:
#device lnc
device le

FreeBSD 6.1+, and maybe some older versions, support both the BusLogic (bt) and the LSI Logic (mpt) SCSI adapters. I personally recomend the BusLogic driver. I forget the exact details, but in the VMWare certification classes, they said the BusLogic driver was the better performing driver.

I also enable device polling in order to gain some possible speed boosts. I believe this also reduces error messages you may see from the lnc driver:
options DEVICE_POLLING
options HZ=100

And lastly in the kernel world, I comment out apic. The device apic line is technically deprecated according to the NOTES file, but it is still in there anyway.
#device apic

After rebuilding/installing the world, I also make the following changes to /boot/loader.conf. These lines really just reinforce what I did in the kernel configuration file and should work even if you don't rebuild your kernel:
hint.apic.0.disabled=1
kern.hz=100

After all this, you shouldn't see any error messages from the lnc driver, and you shouldn't see any issues with clock drift. I am still trying to figure out the best way to get the vmxnet driver working.

Monday, May 7, 2007

Open Source License Business

If you are developing open source software, the bevy of open source licenses you have to choose from is rather enormous. You have choices of everything from Public Domain to GPL to Dual licenses, etc.

And that is a good thing. While making choices is hard, having a choice is always a good thing. And in the case of open source licenses, the plethora of options provides you with opportunity to decide how your software impacts other developers and corporations. The two licenses I run into most often are the BSD license and the GPL license.

The BSD license is about user's freedom and author's credit. The user has the right to do anything and everything with the licensed material. The only restriction is that the original author gets credit for the original author's material.

Think about it this way. BSD promotes free-trade and constant exchange of ideas based on individual freedoms and values. The market place in this free-trade economy is populated by the developers of the world. The copyright holder or licenser has no power or authority to require tithe or change-sets. Under the BSD license the world becomes a free market where ideas are free to be used in any way possible.

In contrast, the GPL license is about the material's freedom. The user has the right to do anything with the licensed material as long as he makes an attempt to put all of his changes and usages into the open source community.

The GPL license was designed to make sure that open source code never finds it's way into proprietary software. It takes away the end users freedom to attempt to make improvements for personal gain; however, this loss of individual freedom comes with the benefit of an empowered community. All of the developers can rest assured that their code is not going to disappear and become closed source.

So in summary, the BSD license is not "promoting" proprietary code. It is promoting real individual freedom and opportunity. The BSD license can be considered anarcho-capitalism's equivalent in licenses.

The GPL license does not promote individual freedom, it promotes a community around the good of all. The GPL license can be considered socialism's equivalent in licenses.

Think about what kind of impact you want your code to have on the world, and make your decision based on that. And remember, no matter what license you choose, the copyright holder always has the power to change their mind.