support Dominion format #119

tarheel · 2018-08-11T23:08:11Z

George's comment from #96: "I believe the guys at FairVote have the Dominion data from Santa
Fe."

tarheel · 2019-01-02T05:49:26Z

@gngilbert @CalebKleppner do you know how we can get this data?

gngilbert · 2019-01-06T04:49:15Z

I believe I have the data from Santa Fe. I will try to get it uploaded tomorrow and we can discuss it. I could not figure it out. We may have to talk to Dominion.

gngilbert · 2019-01-07T22:32:46Z

I think I have attached the Santa Fe CVR files here. Not sure how this works. If they are not attached, I can send them attached to an e-mail.

drive-download-20181214T175737Z-001.zip

nealmcb · 2019-04-08T15:39:49Z

Thanks - fascinating.
Wow - that is such an awful way to deliver CVRs of ranked ballots: a bloated set of CSVs with 205 columns for perhaps 21 candidates, enormous column names over 40 chars each, and so much extraneous and badly-organized data. And you have to join at least 3 csv files together to see contest and candidate names!
I dare say I'd much prefer the json-format output that I expect they have (and used to produce this output).

gngilbert · 2019-04-11T16:07:52Z

That was my reaction as well and why we have not taken on the task of converting it to run in the RCVRC Tabulator. At some point, however, we need to do this and I am looking for recommendations as to how to do this in the most efficient manner.

gngilbert · 2019-04-11T16:08:49Z

Jon, should you bring David in on this issue? (I would but don't know how. Thanks.)

tarheel · 2019-04-12T17:05:00Z

@davidryal

gngilbert · 2019-06-09T17:55:38Z

We'll keep this on the shelf for now.

chughes297 · 2020-01-28T15:21:15Z

Pedro at FairVote has been working with the San Francisco CVRs I'm attaching to this post, and developed this process for converting the JSON in to a human readable format:
https://docs.google.com/document/d/1uR94xFn-oB3B_17lftP2gZkLZAGtvs6Wu5rw2vryDsE/edit?usp=sharing. Wanted to share in case it's useful as you guys get started on Dominion
CVR_Export_20191125163446.zip

tarheel · 2020-02-10T06:57:28Z

More questions from @moldover:

do we need to handle multiple contests?
how are column headers parsed? there's a bunch of different text:
Original/Cards/0/Contests/0/Marks/0/Rank
Original/Cards/0/PaperIndex
Original/Cards/0/Contests/0/Marks/8/MarkDensity
do we need to interpret selections, using outstack condition manifest?

gngilbert · 2020-02-11T16:44:48Z

I'm not familiar enough with any of this to provide answers. Do we need to bring Keith in on this? George Gilbert RCV Resource Center george.gilbert@rankedchoicevoting.org 336-906-0047

…

On Mon, Feb 10, 2020 at 1:57 AM Louis Eisenberg ***@***.***> wrote: More questions from @moldover <https://github.com/moldover>: do we need to handle multiple contests? how are column headers parsed? there's a bunch of different text: Original/Cards/0/Contests/0/Marks/0/Rank Original/Cards/0/PaperIndex Original/Cards/0/Contests/0/Marks/8/MarkDensity do we need to interpret selections, using outstack condition manifest? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#119?email_source=notifications&email_token=AJODE7QHKLKL3YCOS24MLLDRCD3FTA5CNFSM4FPFLTVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELHN2II#issuecomment-583982369>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJODE7VTQPT3DLSDN3B3MWDRCD3FTANCNFSM4FPFLTVA> .

tarheel · 2020-02-11T16:49:58Z

These are the same questions that Jon asked previously over email; I'm just reproducing them here to keep things organized. Presumably Keith has already shared them with Dominion folks.

gngilbert · 2020-02-11T16:59:06Z

Thanks, Louis. George Gilbert RCV Resource Center george.gilbert@rankedchoicevoting.org 336-906-0047

…

On Tue, Feb 11, 2020 at 11:49 AM Louis Eisenberg ***@***.***> wrote: These are the same questions that Jon asked previously over email; I'm just reproducing them here to keep things organized. Presumably Keith has already shared them with Dominion folks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#119?email_source=notifications&email_token=AJODE7XQA5EDC2Q4HI4MMIDRCLJLNA5CNFSM4FPFLTVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELNFCEQ#issuecomment-584732946>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJODE7WKJHTJQQBKFUV4ZBTRCLJLNANCNFSM4FPFLTVA> .

nealmcb · 2020-02-11T18:50:05Z

Thanks for digging out the data and a parsing method. I'm pretty sure I've parsed similar Dominion JSON files before in Python.

Also, as yet another approach, the RLA (SHANGRLA) audit of the election seems to have used the Javascript in this file to parse out just the votes on each ballot in a given contest:

https://github.com/pbstark/SHANGRLA/blob/master/ConvertCVRToRAIRE.html

The RAIRE format (for later processing) is a CSV file.
First line: number of contests.
Next, a line for each contest
Contest,ID,N,C1,C2,C3 ...
ID is the contest ID
N is the number of candidates in that contest
and C1, ... are the candidate id's relevant to that contest.
Then a line for every ranking that appears on a ballot:
Contest ID,Ballot ID,R1,R2,R3,...
where the Ri's are the unique candidate IDs.

Yielding this output file for Mayor

But I've only glanced at that stuff - I might have misinterpreted something there....

CalebKleppner · 2020-02-17T18:43:01Z

All -- I see that you've gotten started on working with Dominion json files for CVRS. This is good, as I'm sure hope you've heard that the Dem RCV contests in Alaska, Wyoming and Kansas have recently switched from ClearBallot to Dominion equipment. Yes, we are just 7 weeks away from the elections, and the contractor has changed equipment. This means that the cast vote records will be in the json format that San Francisco uses ( https://www.sfelections.org/results/20191105/data/20191125/CVR_Export_20191125163446.zip ). I will probably be conducting the RCV tallies for these 3 states and will be relying on the Universal Tabulator. I'm very eager to start testing it with San Francisco data. Should I plan on obtaining CVRs in Dominion json format, converting them to a human readable format as Pedro worked out below, and then tabulating in the Universal Tabulator? I believe that I'll be receiving a large set of CVRs that include both mail ballots and precinct ballots, so it will be necessary for me to break the CVRs into separate files based on county (Wyoming) or CD (Kansas). Thanks for any info you can provide. I'm happy to discuss anytime (203-376-4080, ck@fairvote.org). Best, Caleb Caleb Kleppner 203-376-4080 On 1/28/2020 10:21 AM, chughes297 wrote: Pedro at FairVote has been working with the San Francisco CVRs I'm attaching to this post, and developed this process for converting the JSON in to a human readable format: https://docs.google.com/document/d/1uR94xFn-oB3B_17lftP2gZkLZAGtvs6Wu5rw2vryDsE/edit?usp=sharing. Wanted to share in case it's useful as you guys get started on Dominion CVR_Export_20191125163446.zip — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "#119?email_source=notifications\u0026email_token=AJ4HUXHFKSGUOXAWXXXEBA3RABEOZA5CNFSM4FPFLTVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKDV5IA#issuecomment-579296928", "url": "#119?email_source=notifications\u0026email_token=AJ4HUXHFKSGUOXAWXXXEBA3RABEOZA5CNFSM4FPFLTVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKDV5IA#issuecomment-579296928", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

gngilbert · 2020-02-17T20:16:38Z

All, obviously this is very short notice on this change of vendor. I think we need to plan a conference call with Caleb for later this week. I'll ask chris to send out a doodle poll. George Gilbert RCV Resource Center george.gilbert@rankedchoicevoting.org 336-906-0047 On Mon, Feb 17, 2020 at 1:43 PM CalebKleppner <notifications@github.com> wrote:

…

All -- I see that you've gotten started on working with Dominion json files for CVRS. This is good, as I'm sure hope you've heard that the Dem RCV contests in Alaska, Wyoming and Kansas have recently switched from ClearBallot to Dominion equipment. Yes, we are just 7 weeks away from the elections, and the contractor has changed equipment. This means that the cast vote records will be in the json format that San Francisco uses ( https://www.sfelections.org/results/20191105/data/20191125/CVR_Export_20191125163446.zip ). I will probably be conducting the RCV tallies for these 3 states and will be relying on the Universal Tabulator. I'm very eager to start testing it with San Francisco data. Should I plan on obtaining CVRs in Dominion json format, converting them to a human readable format as Pedro worked out below, and then tabulating in the Universal Tabulator? I believe that I'll be receiving a large set of CVRs that include both mail ballots and precinct ballots, so it will be necessary for me to break the CVRs into separate files based on county (Wyoming) or CD (Kansas). Thanks for any info you can provide. I'm happy to discuss anytime (203-376-4080, ***@***.***). Best, Caleb Caleb Kleppner 203-376-4080 On 1/28/2020 10:21 AM, chughes297 wrote: Pedro at FairVote has been working with the San Francisco CVRs I'm attaching to this post, and developed this process for converting the JSON in to a human readable format: https://docs.google.com/document/d/1uR94xFn-oB3B_17lftP2gZkLZAGtvs6Wu5rw2vryDsE/edit?usp=sharing . Wanted to share in case it's useful as you guys get started on Dominion CVR_Export_20191125163446.zip — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. [ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": " #119?email_source=notifications\u0026email_token=AJ4HUXHFKSGUOXAWXXXEBA3RABEOZA5CNFSM4FPFLTVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKDV5IA#issuecomment-579296928 ", "url": " #119?email_source=notifications\u0026email_token=AJ4HUXHFKSGUOXAWXXXEBA3RABEOZA5CNFSM4FPFLTVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKDV5IA#issuecomment-579296928 ", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ] — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#119?email_source=notifications&email_token=AJODE7TJKLJPUNZFX6IQ7WLRDLLDPA5CNFSM4FPFLTVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL7LGRI#issuecomment-587117381>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJODE7R7VDNPAT2FF3DRGYDRDLLDPANCNFSM4FPFLTVA> .

tarheel · 2020-02-21T19:43:48Z

Here's a parser that @catrope wrote for the new format: https://github.com/catrope/sf-rcv/blob/master/parse-new-format.js

And the rest of the code in that repo handles the old format.

More explanation from him:

I apologize for the lack of documentation, so I'll briefly explain it here instead.
Download a ZIP file from the SF elections web site (e.g. this one)
Create a new directory and extract the ZIP file into it
Run my script in this directory, and capture its output in a file
So, in a terminal/shell:

$ wget https://www.sfelections.org/results/20191105/data/20191114/CVR_Export_20191114160248.zip
$ mkdir sf-20191114
$ cd sf-20191114
$ unzip ../CVR_Export_20191114160248.zip
$ node parse-new-format.js > reformatted.json

The attached screenshot illustrates what the output looks like. It's a big array, where every element is a ballot card, and every race is an array of choices. For RCV contests, the first element is the first choice, the second element the second choice, etc; for non-RCV contests (including measures), there is only one element. If a choice is null that means it was left blank (undervote), if the choice is itself an array of multiple values that means multiple choices were selected (overvote). So in the attached example, the District Attorney rankings were 1) blank, 2) Loftus, 3) overvote for both Tung and Boudin, 4) Dautch, and their Mayor rankings were 1) Breed, 2) Pang, 3) Ventresca, 4) blank, 5) Zhou, 6) Jordan+Robertson overvote.

I haven't yet adapted my RCV code to ingest this format, but it shouldn't be too much work, and the format should be relatively easy to deal with for other scripts as well. Since the data for non-RCV races is also all there, you should also be able to compute correlations between contests that appear on the same card (e.g. local measures: how many people voted Yes on A but No on E or vice versa, and where were they located?). One thing I want to look at at some point is the geographic distribution of Nancy Tung's transferred votes: Tung->Loftus, Tung->Boudin and Tung->exhausted were each over 30%, and I'm curious to see if those three groups are concentrated anywhere in particular. I also want to look at the second choices of Boudin and Loftus voters.

HEdingfield · 2020-03-28T19:01:15Z

Latest update: I believe we mostly have this issue addressed with the closing of #404, #406, #407, #408, and #415.

Remaining related open issues (which could probably supersede the need to keep this one open): #434, #437, #438.

@moldover @tarheel, could you please look closely over this issue and file any other necessary issues to address any last loose ends here? Then I think we should be good to close it.

tarheel · 2020-03-28T21:09:56Z

Sounds right to me. Will let @moldover make the final call.

moldover · 2020-09-24T14:27:15Z

Closed via #470

tarheel assigned gngilbert Aug 11, 2018

CalebKleppner added this to the v 1.0 Submission milestone Aug 31, 2018

HEdingfield removed this from the v 1.0 Submission milestone Aug 31, 2018

CalebKleppner added this to the v 1.0 Submission milestone Aug 31, 2018

HEdingfield modified the milestone: v 1.0 Submission Aug 31, 2018

tarheel unassigned gngilbert Mar 8, 2019

gngilbert added the reviewed-by-george label Jun 9, 2019

freedomcounts added the reviewed-by-keith label Jun 10, 2019

chughes297 added the reviewed-by-chris label Jun 10, 2019

tarheel added enhancement new input format and removed reviewed-by-chris labels Dec 2, 2019

tarheel added 2020 primaries 2020 primaries essential labels Feb 10, 2020

tarheel assigned moldover Feb 25, 2020

HEdingfield modified the milestones: v 1.0 Submission, v 1.x Future Aug 5, 2020

moldover closed this as completed Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support Dominion format #119

support Dominion format #119

tarheel commented Aug 11, 2018

tarheel commented Jan 2, 2019

gngilbert commented Jan 6, 2019

gngilbert commented Jan 7, 2019

nealmcb commented Apr 8, 2019 •

edited

Loading

gngilbert commented Apr 11, 2019

gngilbert commented Apr 11, 2019

tarheel commented Apr 12, 2019

gngilbert commented Jun 9, 2019

chughes297 commented Jan 28, 2020

tarheel commented Feb 10, 2020

gngilbert commented Feb 11, 2020 via email

tarheel commented Feb 11, 2020

gngilbert commented Feb 11, 2020 via email

nealmcb commented Feb 11, 2020

CalebKleppner commented Feb 17, 2020 via email

gngilbert commented Feb 17, 2020 via email

tarheel commented Feb 21, 2020

HEdingfield commented Mar 28, 2020

tarheel commented Mar 28, 2020

moldover commented Sep 24, 2020

support Dominion format #119

support Dominion format #119

Comments

tarheel commented Aug 11, 2018

tarheel commented Jan 2, 2019

gngilbert commented Jan 6, 2019

gngilbert commented Jan 7, 2019

nealmcb commented Apr 8, 2019 • edited Loading

gngilbert commented Apr 11, 2019

gngilbert commented Apr 11, 2019

tarheel commented Apr 12, 2019

gngilbert commented Jun 9, 2019

chughes297 commented Jan 28, 2020

tarheel commented Feb 10, 2020

gngilbert commented Feb 11, 2020 via email

tarheel commented Feb 11, 2020

gngilbert commented Feb 11, 2020 via email

nealmcb commented Feb 11, 2020

CalebKleppner commented Feb 17, 2020 via email

gngilbert commented Feb 17, 2020 via email

tarheel commented Feb 21, 2020

HEdingfield commented Mar 28, 2020

tarheel commented Mar 28, 2020

moldover commented Sep 24, 2020

nealmcb commented Apr 8, 2019 •

edited

Loading