Erlang/OTP is without any doubt a powerful tool. But as with any technology it's not good for any use case imaginable. It has a lot of strengths but some weaknesses too. Erlang glances in everything related to distribution and reliability. If you want to write software to be arbitrarily distributed over multiple instances, CPU cores, mashines or even data centers without having to change the programming paradigm for any particular case you probably have the perfect tool with Erlang. To benefit from Erlang you should have a problem best solved with a distributed solution. This may be reliable message passing or rock solid in memory storage but it is probably not web page generation and delivery.
In the Internet world, Erlang is perfect for middle ware or backend systems like caches, message queues and exchanges, databases or storage abstractions. But it is probably not the right tool to write web applications. Sure, you may add a HTTP endpoint to your message middle ware or database but should you use Erlang in a way one usually uses JSP or PHP or Ruby? Probably not. Web applications have a very short life cycle. The business rules are in a constant flow and have to be changed over and over again. To define such rules in Erlang may turn out to be a very hard job. Despite of its expressiveness Erlang is not the language to be used as an embedded language as for Yaws dynamic content or in ehtml. It works - but it is comparably hard to maintain and not that fast in terms of execution time.
It's not that Erlang is a functional language but it's a functional language with a special syntax. There are all those commas, semicolons and dots and a pretty verbose notation for associative arrays, the records. The benefits of the syntax as for example how you deal with binary data is of little use when programming the business rules of a web application. It's simply not designed for such a use case. It's designed for writing reliable software dealing with network communication and related stuff. But it's not made to express complex rules in a domain specific language.
On the other hand, the Erlang VM is highly optimized to spawn and execute a huge number of small processes but it's probably not optimized to execute a thread in the shortest possible time ever. Modern Java implementations may beat current Erlang easily. Sure, the maintainers of the Erlang VM do a lot of work improving both performance and SMP scalability but all those optimizations a not yet at the end. High performance is not the domain of the OTP.
When you ever think about using Erlang, the first question to ask should be whether your problem deals with either redundancy, scalability or distribution. Second is whether you not have to deal with high performance as well, and you should not have to deal with the low lifetime of business rules. If you answer these questions with yes you get a highly optimized and convenient tool for the job.
Arrays are at the core of PHP and act as the swiss army knife for data structures. They are known as an efficient and generic solution. But this is only half of the truth. The internal C representation as a hash table causes a non-intuitive performance behaviour of arrays. Associative arrays for example scale badly in key size and the different accessing functions like array_pop() and array_shift() behave very differently in respect to performance. I made a quick benchmark to visualize the difference. At least in loops it is more efficient to reverse the array and pop it instead of simply shifting it. This is because an array_shift() - strange enough - forces a reindexing operation for each shift.
Cloud computing becomes more and more popular. And it's cool, indeed. With Amazon webservices such as EC2, SimpleDB, S3 etc. or Google App Engine it is possible to build scalable web applications easily. Even self-scalable applications. Since AWS is completely based on web services not just for usage but for resource management also it should be possible to build an application detecting load peaks and starting up new nodes - all automatically.
Unfortunately, this cool technical feature also adds new attack vectors for black heads. New attacks may be based on the pricing model which is "pay only for what you use". But, exacly that. Scince your application spreads automatically over an arbitrary number of new nodes and consumes an unlimited amount of resources attackers do not need to DDoS your application but just to run a DDoP. This is a "Distributed Denial of Payment" attack.
Their bot nets just need to use the application the ordinary way. As load grows your resource usage grows and the same does your depts.
There is a reason why startups use open source software, PostgreSQL for example, and not an Oracle with a per processor license. The same reason may let them choose a hosting solution where they pay what they have and not what they use. A startup company may soon reach the financial limits in a DDoP condition. So read my lips: don't miss to add some kind of throttle and a good bot and crawler detection when you enter the cloud.
You may also read this article about "Cost Allocation" as a new computing resource affecting algorithms:
Programmers love to talk about performance but only a few talk about scalability, despite of the fact that it is only scalability that counts. Scalability is about how your effort has to grow with your success. This is what you must have in mind if you start a business for the web.
It is always a good idea to plan a new project for scalability from the first steps on. There are some easy task with just a little overhead in development but great benefit if your project reaches the level of success you've intended.
- divide and conquer in size and time
- use a good abstraction layer for data access to be free if you need to partition data
- avoid complicated joins and excessive normalization, they'll kill you if you have to distribute data over multiple data bases and mashines
- do you really need relational database schemas for everything or may a simple key-value-storage work as well
- cache data access from the beginning
- cache more, compute less
- use functional decomposition, partition your system in tiny and efficient units and plug them together by abstraction
- use asynchronous strategies to manage load peaks
- split static and dynamic content carefully, soon you may need a CDN to deliver static content
- apply a good deployment strategy with rollback
- measure and monitor performance and scalability systematically
- scale your revenue in parallel with your technology
Don't miss the last point. It's the most important.
There exist a reasonable number of myths about PHP. One I will challenge today. If you ask an average PHP programmer whether a static function call or a function call on an instance performs better he or she will problably say - without thinking - static. This is wrong.
Sure, if you want to call a method on an instance you first have to instantiate that instance. This takes some time by its own. The sum of CPU time for instantiation and a single method call is in fact greater than the CPU time needed to call a static function. So, if you ask for the time of single calls you will support the myth. But the question is wrong.
In real life you will never do isolated calls. The question of interest is not how a single call performs but how the different kinds of method calls scale. If you measure this you will get a different picture. The instance calls scale definitely better. If you even have just a handful of static method calls on a class the same calls on an instance - a singleton for example - will save you time.
The image below shows a measurement with empty methods and light weight classes on PHP 5.2.5. The point where the two lines are crossing may vary with the complexity of your class.
You've written a not so trivial PHP application and your web site is visited more and more frequently, 1 time per second, 10 times, 20 times per second, more. You transfered your database to a second mashine but still your webserver performance breaks down at 25 request per second. You install a byte code cache like the APC or XCache but your servers still do not deliver more than 30 or 40 requests. What now?
You may spend some weeks to refactor your whole software. Or you invest 30 minutes to fix your file inclusions and you win some 100% performance - even more.
The recipe is easy - watch your server process with strace, take care on file stats and try to reduce it to zero:
- reduce your include_path to a single entry or don't use it at all
- include or require absolute paths
- ensure that each try to include a file is a hit
- avoid file_exists(), is_file(), is_dir(), is_link() etc.
- avoid autoloading
Each file system access forces the operating system to switch context and wastes a huge number of CPU cycles. If you have a long include path and the file to include is found at the end you have a lot of useless file stats. Even the byte code cache cannot protect you from this since it stores only files found, not files that do not exist. For each non existing file all the stats and lookups take place again and again. This is what your CPU works for, not your business logic.