Urtak Blog

Ask Questions. Get Answers.

Renewable resource or fossil fuel?

Throughout history, humans have spent countless lifetimes battling against sun, wind, and wave. Their almost limitless power has the capacity to change our lives in an instant. Yet instead of harnessing this energy to meet our needs for electricity, we continue to spend huge amounts of lives and money removing non-renewable fuels from the ground. We who are young today will pay the price for this shortsightedness over the course of the rest of our lives.

Everyone knows that there are better ways to power our homes and vehicles than with fossil fuels, but structures of power and interest make it practically impossible to make the transition. Energy is treated as something scarce and expensive, when it should be free and abundant.

Last year, in the United States alone, firms spent more than nine billion dollars conducting market research. And yet our knowledge of public opinion is basically zero. Ad campaigns flop, products lose market share, and well-funded political campaigns manage to spectacularly implode. The return on all this opinion research investment is shamefully low.

The current opinion research regime treats people and their minds as commodities. The methods of gathering opinion are extractive. The company bothers you on the phone at dinner time, or pops up a totally unrelated survey on a website you happen to be browsing, or worst of all, someone with a clipboard comes up to you in a public place and disrupts your peace. In all of these cases, the experience is the same. You are presented with a list of stultifying questions with a mystifying purpose, and then your answers are taken away and vanish forever. It’s no wonder that the vast majority of people refuse to participate in polls and surveys, even after they’ve been ambushed.

But people love to answer questions and share their opinions! Everyone is telling everyone else what they think, and non-stop. The opinions that we want are out there, we just have no way of collecting them. What percentage of New Yorkers like baseball? How many Americans think Hugo Chávez was a dictator? These are extremely simple questions, and it shouldn’t be hard to answer them, but one thing is clear: with the extractive approach currently in use, not billions, but trillions of dollars would be required to find out what the world is thinking.

We believe that the act of finding out what people think should be creative, not extractive. People love answering questions, and they love asking them too. That is fundamental, since if you are trying to find out what a group of people are thinking, you concede that you do not know what they think in advance. If you don’t know what they think, why assume that you know what questions to ask? Yet this basic mistake is repeated every day by every opinion researcher in the world. With all of the extractive research that money has been wasted on over the years, we know next to nothing about what the world is really thinking.

The opinions are out there. As a researcher of what people think, your focus should be much less on getting your questions answered than on surfacing questions that you hadn’t imagined. With a creative, collaborative approach that empowers the people you contact, your knowledge of the world around can only increase. The question is the foundation of science. And it is for this reason that we believe that people should have the right to ask questions at all times, in any situation, no matter where they are.

That a resource as valuable as petroleum is being depleted to benefit the greed of an interested few is a terrible waste. But without the construction of global public opinion, global democracy will never be possible. Treating human beings and their minds as commodities to be used and thrown away is worse than a waste: it is a crime.

Christopher Hitchens vs Dick Morris on polling

From a sharp-eyed urtakista in Australia

It sounds like Hitchens might have been a fan of Urtak:

“Triangulation,” he [Dick Morris] writes, “is much misunderstood. It is not merely splitting the difference between left and right.” This accurate objection–we are talking about a three card monte and not even a split–must be read in the context of its preceding sentence: “Polls are not the instrument of the mob; they offer the prospect of leadership wedded to a finely-calibrated measurement of opinion.”

By no means–let us agree once more with Mr. Morris–are polls the instrument of the mob. The mob would not know how to poll itself, nor could it afford the enormous outlay that modern polling requires. (Have you ever seen a poll asking whether or not the Federal Reserve is too secretive? Who would pay to ask such a question? Who would know how to answer it?) Instead, the polling business gives the patricians an idea of what the mob is thinking, and of how that thinking might be changed or, shall we say, “shaped.” It is the essential weapon in the mastery of populism by the elite. It also allows for “fine calibration,” and for capsules of “message” to be prescribed for variant constituencies.

p. 17-18, “No One Left To Lie To: The Triangulations of William Jefferson Clinton.”

Do you trust CNN?

Yesterday The Daily Beast published an article about new CNN boss Jeff Zucker’s shake up the cable news giant. With that article came an Urtak poll, in which over two thousand people participated. What did we learn?

Some very interesting facts. First and foremost, readers of this Beast article have a strong distrust of CNN. Only 21% answered yes to “Do you trust CNN as a news source?” A very low number it would seem. By way of comparison 33% answered yes to the user-asked question “Do you trust the New York Times?” However, another result seemed to suggest that something was unusual, as 63% said they prefer Fox News to CNN. What could explain such reactions from what is a reliably centrist and liberal platform? The “Drudge effect.” So we really should take these results with a grain of salt. Liberals are probably not fleeing CNN.

What might not come as a surprise is the fact that people who trust CNN as a news source are four times more likely to actually watch it. So what would we do if we were in Jeff Zucker’s shoes, and had absolute power over that great outlet? Toss out the fluff, and report the facts.

Getting Blazed

On Tuesday, January 8, 2013, we encountered some significant site performance issues related to a huge deluge of traffic. We’ve had similar, large bursts of traffic before, but this one was just a bit larger and revealed a hidden flaw in our infrastructure that took quite a bit of frantic investigation to find. In retrospect, of course, it should have been much easier to catch; however, in the heat of the moment, it was hard to step back from the situation at hand and recognize patterns in the behavior. So with my inaugural Urtak engineering blog post, I hope I can help anyone else who encounters a similar set of circumstances.  Also, I must say that I’m embarrassed that I have remained so quiet on the blogosphere: these write ups are definitely a public service of sorts that I have been selfishly consuming but not providing.

(Just to set the stage, we have a Ruby on Rails setup that includes a front end server running nginx/HAProxy, backend application servers running Unicorn, worker servers processing Resque jobs, a Sphinx database, a MySQL database, and a Redis database.)

Once, we started to see the traffic come in at a rates of up to a few hundred user responses a second, our initial reaction was to launch a few more backend servers to handle the additional traffic. This technique usually worked, but this time it was different. Using our handy NewRelic monitoring, we clearly saw that our application apdex score was plummeting due to skyrocketing server response times. Yet, our database throughput remained constant, as well as overall CPU usage described in our “scalability analysis. Surely our backend servers were the bottleneck. “LAUNCH THE TORPEDOS:” we fired up more backends. The performance continued to degrade. We scratched our heads. What. The. Eff?

In our RightScale monitoring, Paul and I noticed that our Redis server was showing significant CPU usage. We proceeded to investigate. We sshed into our Redis master and tried to initiate redis-cli: “connection timed out.” A-hah! Perhaps we had reached some sort of connection limit for our Redis server. Either that or our server was locked up. (Only later would we find out that our version of Redis, 2.4.5, was incorrectly reporting the error, which later versions display as “too many connections.”)

At this time, we also noticed that while we were trying to connect to our Redis server, our CPU usage on all our worker servers had dropped to zero. Resque-web confirmed that we had collected millions of jobs in our queue, and no workers were running. We saw what we expected there in terms of which jobs were queued. In particular (and relevant to this story) there were millions of counter cache jobs waiting to run: we calculate all our counter caches related to users responding to questions asynchronously using Resque and a gem I wrote called ar-resque-counter-cache.

It also happened that when the workers stopped working (presumably because they were no longer able to connect to Redis–too many connections), the performance of the site was restored.  This confused us, and this is where we really should have figured out what was going on. But we were fixated on the fact that we had reached a connection limit on our Redis server. Our application servers must have been blocking waiting for Redis connections! We deduced from a few forum articles that this limit on connections had to do with our version of Redis.

<mishaps> (Some clumsy Redis administration)
We rushed to upgrade our chef scripts to include Redis 2.6.8, and proceed to launch a new Redis server as a second slave of our existing master; however, the new 2.6.8 Redis server would not sync the data. We then stumbled across this list of known bugs in Redis. Perhaps we had encountered “Connection of multiple slaves at the same time could result into big master memory usage, and slave desync.” Alright, we thought, let’s manually scp the dump.rdb from our other slave. We hastily slapped up our maintenance page to prevent new data from entering Redis, and began the scp. Simultaneously, we swapped our DNS entry that our app servers use to connect to our Redis master to point to the newly launched 2.6.8 instance. Once the dump transferred and the short DNS TTL expired, we were ready to go. We moved the dump.rdb into place and booted up the server only to realize that we booted it with the slave configuration we had originally used for the migration. Redis-cli revealed no data in the database, and after another few minutes of confusion realized the mistake and changed the configuration. We restarted the Redis instance and once again NO DATA. The shutdown sequence in the restart had overwritten our dump.rdb with the contents of the empty dataset. Poof, there went our dump! Back to scp… Lucky that wasn’t the only copy of the Redis dump! (ALWAYS BACKUP!)
</mishaps>

Finally, we got the new Redis 2.6.8 instance running with our data! But… immediately, we hit the connection limit. This time 986. Somewhere I saw something about 1018-32… long story short, we changed the open filehandle limit (`ulimit -n 4096`) and flew past 1024 Redis connections. But site performance completely degraded again! If we had been more perceptive we would have realized the obvious correlation between this performance and the fact that our Resque workers began churning jobs again. But why would asynchronous workers have any performance impact on our front end experience?

We returned to Redis. Once the workers started again, the CPU usage shot up to 100%. It didn’t make sense that the CPU should be so high: as it says on the Redis FAQ, the CPU should really not be the bottleneck. What was going on. Back to frantic googling. We stumbled across a forum article describing a similar problem someone was having. It turned out he used poor man’s profiler to discover that he was calling the keys command often, which is O(n) for the number of keys in the Redis database. I immediately realized ar-async-counter-cache, which I had coded quite a while back, made use of a potentially expensive call: it uses LREM, which is O(n) for the length of a list. Our counter_cache queues were 5,000,000+ items long. And each counter_cache job performed a LREM operation on that list! This call was originally designed to make the processing of repeated incrementations more efficient, but little did I think it would come back to bite us to hard in the future when the counter_cache queue grew too large. Poor decision making in my original gem design!

Using poor man’s profiler we did indeed find LREM was being called over and over. So Redis was blocking due to heavy CPU usage by these LREM calls. We temporarily stopped all of our asynchronous workers. I got to hacking very quickly on the ar-resque-counter-cache gem and released a new version that used a different strategy for limiting the number of jobs necessary to increment counter caches. By the wee hours of the night I was able to deploy the new code, get the other millions of Resque jobs done, and recover the data that was lost in our Redis fiasco. Phew!

TL/DR
Watch out for potentially expensive Redis operations like LREM! Like @antirez says, Redis really should not be CPU bound.

Would you rather…

Would you rather fight 100 duck-sized horses or one horse-sized duck? 72% of Dish readers agree that taking on a BFD is a dumb idea! More than 1000 people answered the question.