Update: 11/04/15 As I said in the article below, Gary Rennie, one of the people doing the experiment finally posted a very detailed account of how they were able to achieve this incredible milestone. Read all about it here
If you still don't follow the Phoenix Framework (Web framework written in Elixir) creator, Chris McCord, you should. In the last week he has been experimenting how far Phoenix can go. He was able to set up a commodity Rackspace server with 40 cores and 128GB of RAM (although it seems he only needed less than 90GB) and 45 other servers to create an astounding 2 million clients with long running Websocket connections and broadcasting messages to all of them at once!
The thread is very interesting so I decided to compile the most interesting bits directly from his Twitter feed. Take a close look!
Chris or someone from the team will probably report the details about this experiment soon, but because of those testings and benchmarks Phoenix master already got some bottleneck fixes and it is pretty ready for prime time! If you were not aware he is writing a book about Phoenix for The Pragmatic Programmer and I recommend you buy the beta as well.
To give some perspective, Websocket connections will probably sit idle and no real app will broadcast to all users at once all the time, this experiment is trying to exercise the most out of the metal. One comparison would be this Node.js experiment holding 600k connections in a small machine but CPU load going up faster. Apples to Oranges comparison, but just to keep perspective.
josevalim: 128k users connected. @chris_mccord broadcasts a message to all in the channel, @TheGazler receives it immediately on the other side.
chris_mccord: @j2h @josevalim @TheGazler yep! 4 cores, 16 gb memory. Single machine
chris_mccord: Heres what a Phoenix app & server look like with 128000 users in the same chatroom. Each msg goes out to 128k users!
chris_mccord: @andrewbrown @josevalim it’s basically this app with some optimizations on phx master
chris_mccord: We just hit 300k channel clients on our commodity 4core 16gb @Rackspace instance. 10gb mem usage, cpu during active run < 40% #elixirlang
chris_mccord: @chantastic @bcardarella +1 to prag book. My RailsConf Elixir workshop also might be a nice companion to the book
chris_mccord: @SkinnyGeek1010 these are 300k individual ws connections (joining a single channel)
chris_mccord: Calling it quits trying to max Channels– at 333k clients. It took maxed ports on 8 servers to push that, 40% mem left. We’re out of servers!
chris_mccord: @bratschecody 300k on a single server, with connections pushed from 8 servers
chris_mccord: To be clear, it was a single 4core server, 16gb. Traffic was pushed from 8 servers to open 333k conns. Out of ports, need more servers
chris_mccord: @perishabledave 30-40% under active load. Once 333k established, idle.
bratschecody: Whenever @chris_mccord speaks it's how they've increased clients a single Phoenix server is handling. 128k, 300k, 330k, 450k. DON'T STOP MAN
chris_mccord: @gabiz we also improved our arrival rate by 10x, now reliably establishing conns at 10k conn/s . Next up is reducing conn size
chris_mccord: Consider what this kind of performance in a framework enables. With Pusher you pay $399/mo for 10k conns. This box is $390/mo for 450k conns
chris_mccord: I’m sure comparable hardware on AWS is even cheaper
chris_mccord: On a bigger @Rackspace box we just 1 million phoenix channel clients on a single server! Quick screencast in action:
chris_mccord: @AstonJ accordingly to recent tests, ~ 2 million uses 58GB
chris_mccord: @perishabledave 40core / 128 gb. These runs consumed 38gb, cpu was very under-utilized. Ran out of ports, so spinning up more boxes
chris_mccord: Thanks to @mobileoverlord and @LiveHelpNow , they are spinning up 30 more servers to help us try for 2 million channel clients on one server
chris_mccord: @AstonJ just standard box with ulimit and file descriptors set higher
chris_mccord: @cbjones1 @Rackspace we are, but we using 45 separate servers to open the connections :)
chris_mccord: @cbjones1 @Rackspace 65k ports per remote ip right?
chris_mccord: Final results from Phoenix channel benchmarks on 40core/128gb box. 2 million clients, limited by ulimit
chris_mccord: The chat app had 2M clients joined to one topic. We sharded pubsub & broadcasts go out in 1-3s. The app is totally snappy at these levels!
chris_mccord: @ashneyderman will publish writeup soon. Only knobs were about a dozen sysctrl/ulimit max files/ports. Stock ubuntu 15.10 otherwise
chris_mccord: One surprising thing with the benchmarks is how well stock ubuntu + dozen max limit options supports millions of conns. No kernel hack req’d
joeerl: @chris_mccord So now you understand why WhatsApp used Erlang :-) - obvious really.
chris_mccord: @lhoguin @felixgallo @joeerl @rvirding I accidentally forgot to set the ulimit higher than 2M, and we’re out of time on the servers now
chris_mccord: @lhoguin @felixgallo @joeerl @rvirding we had 45 tsung clients with capacity to send 2.5M. Every indication says it would’ve been just fine
chris_mccord: @felixgallo @lhoguin @joeerl @rvirding now that we know the gotchas, pretty quick turnaround (few hours with svr setup, coordinating nodes)?
chris_mccord: @felixgallo @lhoguin @joeerl @rvirding @ErlangSolutions devil is in the details tho. We find a bottleneck at 2.25M & spend 48 hours+ fixing
chris_mccord: @mentisdominus it’s a different story when we broadcast to 2M subscribers