Photo Credit: Image by Roland Steinmann from Pixabay

Want to know the election results before everyone else? On Israel’s election night, March 23, Dicta, a non-profit website that normally uses the most advanced algorithms of machine learning and natural language processing in analyzing Hebrew texts, will run a dedicated page using the same technology to give accurate predictions based on initial true polling results.

You’ll find there all the official results, updated in real-time, along with predictions that will be updated accordingly. The Dicta folks promise their page will be chock full of interesting numbers, and, of course, no needless chitchat.

Check out Predicting True Results.

Here’s how it works:

How can we estimate the final results of the election accurately based on reports from 2% of the voting stations? The simple but incorrect answer is that these polling stations adequately represent the rest of the country’s polling stations, with the final distribution of votes among the parties is b reflected by the polling stations we chose.

It’s a reasonable assumption, but something much more sophisticated can be done with these figures: There are more than 12,000 polling stations in the country. Suppose there are true results from 200 of them. For these 200 polling stations, we have, in addition to the fresh results, also the results from the previous elections (to the 23rd Knesset).

Now, if we could find a function that superimposes the previous election results over the new results in a reliable way, we could run the same transformation for the rest of the votes and (we hope) get a reasonable approximation of the overall results.

Incidentally, there are beautiful tools in machine learning for just this purpose on the Dicta page, see details at the bottom.

But how do we know if it really works? They applied the method to the previous elections. In other words, they randomly picked real results from 2% of the ballots in the elections to the 23rd Knesset and compared them to the results of the elections to the 22nd Knesset in the same polling stations. Now they will extrapolate from 2% of the same polling station to calculate the overall results in the 24th Knesset election.

This method is more accurate than a random generalization of partial results. It also enables the website to calculate the expected results in each locality separately (it’s also a good tool for catching fraudulent ballots). And they can calculate how party X voters in the previous election voted in the current election. For example, Moshe Kahlon’s Kulanu party votes in the 21st Knesset election were distributed as follows in the 22nd Knesset election: 43% Blue&White, 23% Likud, 15% Liberman, 12% Labor-Gesher.

This time it will be interesting to know how former Blue&White voters will use their vote this time.

Some details for the nerds among us: the learning is done using gradient descent with regular backpropagation. In the end, it turned out that there was not even a need for hidden layers; a simple matrix is good enough.

There are all sorts of little problems that are likely to disrupt the machine learning: new polling stations, and a great expansion of the number of voting stations for Corona-related cases. Such phenomena have also existed in the past and have not disrupted the process significantly.

The Dicta people hope that the official results will be uploaded to the web in an easy-to-use format. But right now it’s hard to tell what will happen. After calculating the number of votes expected for each party, they’ll cut those that did not pass the threshold percentage and distribute those seats according to the Bader-Ofer law.

Now, the commercial channels announce their predictions as the polls close at 10 PM, based on their sample. Dicta will not run a sample and won’t pre-empt the TV channels. They’ll only start making predictions after real results start dripping in. They’ll publish estimates for each polling station and city—and of course general results, from the moment the real results from at least 2% of the polling stations are reported, and they will continue to improve the models as more true results come in.

The credit for building the entire system belongs to the amazing Shaltiel Shmidman and Professor Moshe Koppel.