Tuesday, October 6. 2009
Installing Google Wave on a laptop ist relatively easy and straightforward. You just have to follow the instructions here: wave-protocol installation. If you have any problems with Openfire such as non working admin passwords you may switch to the famous ejabberd which is a XMPP server written in Erlang. An installation instruction for this setup you may find here: using google wave reference implementation with ejabberd.
After you have a XMPP server running you create the certificates as described here: wave-protocol certificates. You just copy the *.cert and *.key file into your wave-protocol root directory.
Next you edit the ./run-config.sh and assign the proper values of your setup to the variables. Hint: don't forget to comment or remove the line at the top saying
When all this has been done you may start the server:
If all gone well you find some INFO logging in your terminal. Next you open two additional terminal windows and start one wave client per window, each with a different username:
and in the other terminal:
Now you are ready to initiate a conversation between Mr. Foo and Mr. Bar. In Mr. Foo's window type "/new". This will create a new wave. Then type "/add mr-bar@your-domain". This adds Mr. Bar to the wave. In the other window you should see the wave appear. Then you may type something like "Hello Mr. Foo" there. Pressing enter will make the greeting appear in Mr. Foo's window. You've done a Wave communication in your laptop!
Below you find a list of the available commands in the console client. Just type a slash and the name and arguments of the command, then enter.
Wednesday, August 19. 2009
What is that awesome social networking all about? In essence, it's maintaining a social graph and organizing data exchange over that social graph. On a reasonable abstract level, it's nothing more. The hype of social networking is decreasing nowadays, fighting some late battles with social gaming and stuff like this. Now it's nothing new or exciting anymore but the standard tool of communication.
In this very situation Google has launched it's Wave technology. In essence again, this is social networking on the next level. Wave looks like a browser based collaboration or group ware tool but in fact it's much more. It enables people to communicate and share documents in any way and - that's the point - all real-time.
Not enough real-time, it allows also to track whole histories of communications. Once brought together with geolocation you'll have an application reflecting any dimension of the human mind - time, space and communication.
One side of the coin is that with Wave Google is definitely the Microsoft of our days. This is the killer application the web was waiting for since the first days of dynamic content. An office suite or email client is a last century application, Wave is the new century. In conjunction with cloud computing - kind of a next generation operating system - Google is definitely on the top of the ecological pyramid. The other side of the coin is that Google reinvented social networking. There is not a single feature of recent social networks not included and seamlessly integrated in the Wave package.
So, if you ever think of integrating Wave into your social apps, you should be aware that in fact you do integrate your social graph into Google Wave.
Wednesday, May 27. 2009
Erlang/OTP is without any doubt a powerful tool. But as with any technology it's not good for any use case imaginable. It has a lot of strengths but some weaknesses too. Erlang glances in everything related to distribution and reliability. If you want to write software to be arbitrarily distributed over multiple instances, CPU cores, mashines or even data centers without having to change the programming paradigm for any particular case you probably have the perfect tool with Erlang. To benefit from Erlang you should have a problem best solved with a distributed solution. This may be reliable message passing or rock solid in memory storage but it is probably not web page generation and delivery.
In the Internet world, Erlang is perfect for middle ware or backend systems like caches, message queues and exchanges, databases or storage abstractions. But it is probably not the right tool to write web applications. Sure, you may add a HTTP endpoint to your message middle ware or database but should you use Erlang in a way one usually uses JSP or PHP or Ruby? Probably not. Web applications have a very short life cycle. The business rules are in a constant flow and have to be changed over and over again. To define such rules in Erlang may turn out to be a very hard job. Despite of its expressiveness Erlang is not the language to be used as an embedded language as for Yaws dynamic content or in ehtml. It works - but it is comparably hard to maintain and not that fast in terms of execution time.
It's not that Erlang is a functional language but it's a functional language with a special syntax. There are all those commas, semicolons and dots and a pretty verbose notation for associative arrays, the records. The benefits of the syntax as for example how you deal with binary data is of little use when programming the business rules of a web application. It's simply not designed for such a use case. It's designed for writing reliable software dealing with network communication and related stuff. But it's not made to express complex rules in a domain specific language.
On the other hand, the Erlang VM is highly optimized to spawn and execute a huge number of small processes but it's probably not optimized to execute a thread in the shortest possible time ever. Modern Java implementations may beat current Erlang easily. Sure, the maintainers of the Erlang VM do a lot of work improving both performance and SMP scalability but all those optimizations a not yet at the end. High performance is not the domain of the OTP.
When you ever think about using Erlang, the first question to ask should be whether your problem deals with either redundancy, scalability or distribution. Second is whether you not have to deal with high performance as well, and you should not have to deal with the low lifetime of business rules. If you answer these questions with yes you get a highly optimized and convenient tool for the job.
Monday, April 6. 2009
At the end of the month a new Erlang release will be available for production: R13B. What I know so far is very promising. A lot of work has been done to improve SMP performance and flexibility. Beside of remarkable internal optimizations as for example the multiple run-queues a number of switches are added to have better control on the SMP behavior. Now you can bind schedulers to logical processors or change the number of active schedulers on runtime, just to name a few. Another goodie is added unicode support.
To check out the effect of the new run queues I was running a little benchmark with R13A. The program creates a ring of 5000 processes and executes a simple leader election algorithm with O(n^2) communication complexity. The mashine was a 4 Opteron 1.8GHz with 64bit Linux 2.6.18. Kernel polling was active. As you may see in the picture below the program has not been well written in the sense of good asynchrony since it performs almost always better on a single core. That's perfect for the benchmark and it really challenges the SMP implementation. The multiple run queues have a remarkable benefit in both performance and scalability and this may even be better for an increasing number of cores.
Investment in Erlang is investment in the future. With every new release the inventors of the OTP runtime will add more and better SMP features. The focus of the runtime is still more on distribution and reliability but the performance and scalability will soon reach those of other technologies.
The same test running on a 8 core Intel Xeon 3 GHz with a 64bit 2.6.9 Linux:
Friday, March 27. 2009
Arrays are at the core of PHP and act as the swiss army knife for data structures. They are known as an efficient and generic solution. But this is only half of the truth. The internal C representation as a hash table causes a non-intuitive performance behaviour of arrays. Associative arrays for example scale badly in key size and the different accessing functions like array_pop() and array_shift() behave very differently in respect to performance. I made a quick benchmark to visualize the difference. At least in loops it is more efficient to reverse the array and pop it instead of simply shifting it. This is because an array_shift() - strange enough - forces a reindexing operation for each shift.
This blog post is an appendix to another blog post I wrote about benchmarking SPL data structures.
Tuesday, February 3. 2009
Cloud computing becomes more and more popular. And it's cool, indeed. With Amazon webservices such as EC2, SimpleDB, S3 etc. or Google App Engine it is possible to build scalable web applications easily. Even self-scalable applications. Since AWS is completely based on web services not just for usage but for resource management also it should be possible to build an application detecting load peaks and starting up new nodes - all automatically.
Unfortunately, this cool technical feature also adds new attack vectors for black heads. New attacks may be based on the pricing model which is "pay only for what you use". But, exacly that. Scince your application spreads automatically over an arbitrary number of new nodes and consumes an unlimited amount of resources attackers do not need to DDoS your application but just to run a DDoP. This is a "Distributed Denial of Payment" attack.
Their bot nets just need to use the application the ordinary way. As load grows your resource usage grows and the same does your depts.
There is a reason why startups use open source software, PostgreSQL for example, and not an Oracle with a per processor license. The same reason may let them choose a hosting solution where they pay what they have and not what they use. A startup company may soon reach the financial limits in a DDoP condition. So read my lips: don't miss to add some kind of throttle and a good bot and crawler detection when you enter the cloud.
You may also read this article about "Cost Allocation" as a new computing resource affecting algorithms:
Thursday, November 27. 2008
The last years we underwent the amazing growth of web communities up to previously unimaginable sizes. Literally tenth of million of people collaborate in a single web community or social network and endow their personal content to give the cloud its worth. The success of the large communities is followed by a large number of copycats all over the world trying to participate in the business model. This is what I call the goldrush of the communities.
It never was actually a goldrush but simply a rush. The gold is still missing. Only a handfull of these communities earns profit. Most of them promise profit for the near future and proof by the pure number of members. But the business model to get revenue by advertising only is somewhat controversial. Payed services exist in the minority. People participate on the Web 2.0 for free. Most communities live on the hope to exploit the huge numbers they deal with every day "soon".
For now, there seems to be no place for new communities. The claims are made. The market will consolidate in the next few years and only the largest networks will survive, maybe some niche communities also. If you think about new business you should not try out another social network.
This doesn't mean there is no place for new business in the network domain. Since the huge social networks become more and more platforms for services rather than services by itself the room is open to establish exactly that: services for communities. If you know anything special and how to do it in the internet do it embedable, prepare your service for high traffic and write apps and gadgets for the major networking platforms.
The areas of interest are mainly:
- mobile devices
- location based services
- micro shops
- embeddable games
- multimedia delivery
The goldrush today is not for communities but for services to support communities with valuable features.
Tuesday, November 18. 2008
Programmers love to talk about performance but only a few talk about scalability, despite of the fact that it is only scalability that counts. Scalability is about how your effort has to grow with your success. This is what you must have in mind if you start a business for the web.
It is always a good idea to plan a new project for scalability from the first steps on. There are some easy task with just a little overhead in development but great benefit if your project reaches the level of success you've intended.
- divide and conquer in size and time
- use a good abstraction layer for data access to be free if you need to partition data
- avoid complicated joins and excessive normalization, they'll kill you if you have to distribute data over multiple data bases and mashines
- do you really need relational database schemas for everything or may a simple key-value-storage work as well
- cache data access from the beginning
- cache more, compute less
- use functional decomposition, partition your system in tiny and efficient units and plug them together by abstraction
- use asynchronous strategies to manage load peaks
- split static and dynamic content carefully, soon you may need a CDN to deliver static content
- apply a good deployment strategy with rollback
- measure and monitor performance and scalability systematically
- scale your revenue in parallel with your technology
Don't miss the last point. It's the most important.
There exist a reasonable number of myths about PHP. One I will challenge today. If you ask an average PHP programmer whether a static function call or a function call on an instance performs better he or she will problably say - without thinking - static. This is wrong.
Sure, if you want to call a method on an instance you first have to instantiate that instance. This takes some time by its own. The sum of CPU time for instantiation and a single method call is in fact greater than the CPU time needed to call a static function. So, if you ask for the time of single calls you will support the myth. But the question is wrong.
In real life you will never do isolated calls. The question of interest is not how a single call performs but how the different kinds of method calls scale. If you measure this you will get a different picture. The instance calls scale definitely better. If you even have just a handful of static method calls on a class the same calls on an instance - a singleton for example - will save you time.
The image below shows a measurement with empty methods and light weight classes on PHP 5.2.5. The point where the two lines are crossing may vary with the complexity of your class.
Monday, November 17. 2008
You've written a not so trivial PHP application and your web site is visited more and more frequently, 1 time per second, 10 times, 20 times per second, more. You transfered your database to a second mashine but still your webserver performance breaks down at 25 request per second. You install a byte code cache like the APC or XCache but your servers still do not deliver more than 30 or 40 requests. What now?
You may spend some weeks to refactor your whole software. Or you invest 30 minutes to fix your file inclusions and you win some 100% performance - even more.
The recipe is easy - watch your server process with strace, take care on file stats and try to reduce it to zero:
- reduce your include_path to a single entry or don't use it at all
- include or require absolute paths
- ensure that each try to include a file is a hit
- avoid file_exists(), is_file(), is_dir(), is_link() etc.
- avoid autoloading
Each file system access forces the operating system to switch context and wastes a huge number of CPU cycles. If you have a long include path and the file to include is found at the end you have a lot of useless file stats. Even the byte code cache cannot protect you from this since it stores only files found, not files that do not exist. For each non existing file all the stats and lookups take place again and again. This is what your CPU works for, not your business logic.
Stats are evil.
Sunday, November 16. 2008
today I open my weblog and you are all welcome to read and comment the posts. I hope to share some valuable experience how to write and run web applications. I am programming in different computer languages for about 20 years and the LAMP stack since the late 90s. The "P" stands for Perl first and since 2001 for PHP. Since then I was looking for alternatives to the somehow overpragmatic PHP technology. On the other hand, the market forced me to use rapid technologies. I tried out JAVA and C#, but both seem not to be rapid enough if you program the Web X.0 for business. Today we have better alternatives in Ruby (on Rails) or Groovy (on Grails). With the raise of the Web 2.0 I am facing server loads I never dreamed of, literally many 1000s of PHP hits - per second. This forced me to adopt strategies of load balancing and partitioning. Now I really know about "divide and conquer". The Telcos deal with high loads for years and so they have tools like Erlang to serve them. I gave it a look and it looked strange and beauty. Not as beauty as Haskell, isn't it? Well, whatever you will read here, it is backed up by practice. It's definitely up to you if you believe me or not.
(Page 1 of 1, totaling 11 entries)