A few hours after I posted about DataSift architecture, @choult, one of the about 25 ninjas who develop DataSift platform, tweet me.
The following SlideShare presentation by @stuherbert, another ninja, talks about the use of PHP in DataSift. Unlike what you may think, PHP is widely used in data processing.
System is decomposable in three major data pipelines:
- Data Archiving (Adds new data to Historic Archive)
- Filtering Pipeline (Filtering and delivery data in realtime)
- Playback Pipeline (Filtering and delivery data from Historic Archive)
And PHP is used for many parts of these.
They use a custom build of PHP 5.3.latest with several optimizations and compiled-in extensions (ZeroMQ, APC, XHProof, Redis, XDebug). The also develop some internal components:
- Frink, tweetrmeme’s framework
- Stone: foundation of in-house test tools, Hornet and Storyteller (they probably open source a fork named Storyplayer).
Unfortunately I wasn’t able to find more details about these. Anyway, here is the presentation:
[slideshare id=17557205&doc=phpandthefirehose2013-130323121454-phpapp01]