Translate

Tuesday, March 10, 2015

Computer-Assisted Reporting-Tipsheets & Links

Digging for Truth with Data: Computer-Assisted Reporting

nicar houston website2Editor's Note: When Computer-Assisted Reporting was first published in 1995, data journalism was still in its relative infancy. But a growing legion of journalists were discovering the importance of using computing power to gather, analyze and present stories. Author Brant Houston was among the pioneers, doing data-fueled reporting projects at U.S. newspapers during the 1980s and early '90s. Houston joined Investigative Reporters and Editors in 1994 to direct what became theNational Institute of Computer-Assisted Reporting, expanding NICAR from a handful of boot camps to 50 workshops a year. Interest in CAR -- computer-assisted reporting -- exploded, both in the United States and then overseas. Take the latest NICAR conference, for example--nearly 1,000 attendees went to the annual conference devoted to data journalism (check out the slides, links, and tutorials here).
Houston likes to call investigative reporting the R&D department of journalism. The media's now widespread embrace of data journalism has certainly proved him right on that point. Over 20 years, Computer-Assisted Reporting has helped train and educate thousands of reporters while helping usher in an extraordinary new era of data-driven journalism. With this newly revised, fourth edition, Houston has now expanded on his previous work. We at GIJN are pleased to reprint the introduction to this latest look at how to use the tools of the trade.

It is in computer-assisted reporting where the real revolution is taking place, not only on the big analytical projects, but also in nuts-and-bolts newsgathering. New tools and techniques have made it possible for journalists to dig up vital information on deadline, to quickly add depth and context.
—“We’re All Nerds Now,” Joel Simon and Carol Napolitano, The Columbia Journalism Review (1999)
CAR gives journalists the opportunity to dig for truth in data, and the comparative analysis that a computer can do often reveals pertinent questions. What reporters are able to learn from using CAR provides readers with knowledge and insights that can cut through the clutter of opinionated noise and celebrity obsession. It also can allow even relatively small news operations to delve into problems affecting the global community, yet speak to readers and viewers right around the block.
—“The Benefits of Computer-Assisted Reporting,” Jason Method, Nieman Reports(2008)

data2The words in the first quote were written as the 20th Century was coming to an end, but they remain true as we move deeper into the 21st Century. The words in the second quote, nearly a decade later, show how crucial computer-assisted reporting has become in creating credibility and in recognizing the globalization of news. But there is still a revolution going on in journalism when it comes to data, both at basic and extraordinarily high levels.
In the past decade, software for analysis has continued to become much simpler to use. An overwhelming amount of data is now online and easy to download. Storage space is immense on hard drives, flash drives, and in the “Cloud.” The computing power on a laptop, tablet, or mobile phone dwarfs the power available only a few years ago. The ability to visualize data for better understanding and analysis has become pro forma. Furthermore, a new generation of computer programmers has joined traditional journalists to tackle the problems of capturing data from the Web, cleaning and organizing it, and creating fascinating presentations to be shared with the public and to encourage citizen participation and analysis.
At the same time, many fundamental truths remain the same. Databases are still created by people, and thus they naturally have omissions and errors that people have made and that must be noted and corrected. Every database also is a slice in time and thus is outdated the moment it is acquired and used.
Also remember that a database alone is not a story. Instead, it is a field of information that needs to be harvested carefully with insight and caution. It needs to be compared with and augmented with observation and interviews.
farm subsidy
Farmsubsidy.org, founded by EU Transparency, is a campaign by journalists and activists. It provides detailed information on farm subsidies in every EU member state.
More important than ever is determining the accuracy of a database before using it. Equally important is careful analysis of the data, since one small error can result in monstrously wrong conclusions. The idea of uploading data on the Web and hoping the public or volunteers will consistently make sense of it with reliable analysis has proven unreliable. In fact, journalists—not advocates—are needed more than ever to deliver a well-researched understanding of information and data, and to tell a compelling story using data. Yet, despite changes in technology and the availability of mega-data, some scenarios have not changed.
As you will learn in this book, the techniques described in these scenarios are known as computer-assisted reporting, also referred to as CAR, and they are a part of everyday journalism. Journalists use these and other techniques for daily reporting, reporting on the beat, and for the large projects that win Pulitzer Prizes.
Computer-assisted reporting does not refer to journalists sitting at a keyboard writing stories or surfing the Web. It refers to downloading databases and doing data analysis that can provide context and depth to daily stories. It refers to techniques of producing tips that launch more complex stories from a broader perspective and with a better understanding of the issues. A journalist beginning a story with the knowledge of the patterns gleaned from 150,000 court records is way ahead of a reporter who sees only a handful of court cases each week.
Computer-assisted reporting doesn’t replace proven journalistic practices. It has become a part of them. It also requires greater responsibility and vigilance. The old standard—“verify, verify, verify”— that one learns in basic reporting classes becomes ever more critical. “Healthy skepticism” becomes ever more important. The idea of interviewing multiple sources and cross-referencing them becomes ever more crucial.
puzzlecloud“Computers don’t make a bad reporter into a good reporter. What they do is make a good reporter better,” Elliot Jaspin, one of the pioneers in computer-assisted reporting, warned three decades ago. Many practicing journalists have sought training in the past 20 years and have become proficient in the basic skills of computer-assisted reporting. They have overcome computer and math phobia, and they now put these skills to use on a daily basis. And this has led to more precision and sophistication in their reporting.
To quote Philip Meyer, a pioneer in database analysis for news stories, “They are raising the ante on what it takes to be journalist.” Aiding in the progress and acceptance of these skills has been the proliferation of the Web and social media, the development of inexpensive and easy-to-use computers and software, and the increased attention to the value of data and techniques of analysis in newsrooms.
Computer-assisted reporting is no longer a sidebar to mainstream journalism. It is essential to surviving as a journalist in the twenty-first century. The tools of computer-assisted reporting won’t replace a good journalist’s imagination, ability to conduct revealing interviews, or talent to develop sources. But a journalist who knows how to use computers in day-to-day and long-term work will gather and analyze information more quickly, and develop and deliver a deeper understanding. The journalist will be better prepared for interviews and be able to write with more authority. That journalist also will see potential stories that would have never occurred to him or her.
The journalist also will achieve parity with politicians, bureaucrats, and businessmen who have enjoyed many advantages over the journalists simply because they had the money and knowledge to utilize databases and digital information before journalists did. Government officials and workers have long been comfortable entering information into computers and then retrieving and analyzing it. Businesses, small and large, routinely use spreadsheet and database software. Advocacy groups frequently employ databases to push their agendas.
Without a rudimentary knowledge of the advantages and disadvantages of data analysis, it is difficult for the contemporary journalist to understand and report on how the world now works. And it is far more difficult for a journalist to do meaningful public service journalism or to perform the necessary watchdog role.
offshoreFor years, journalists were like animals in a zoo, waiting to be fed pellets of information by the keepers who are happy for journalists to stay in their Luddite cages. But a good journalist always wants to see original information, because every time other people select or sort that information, they can add “spin” or bias, which can be tough to detect. Computer-assisted reporting can help prevent that from happening.
Many journalists and journalism students now learn the basic tools of computer-assisted reporting because they realize that it is the best way to get to the information since most governmental and commercial records are now stored electronically. Despite security concerns, there still are a mind-boggling number of databases on government and international Websites. So without the ability to deal with electronic data, a journalist is cut off from some of the best and untainted information. The old-fashioned journalist will never get to the information on time—or worse, will be brutally trampled by the competing media.
For a journalist or journalism student, this knowledge also is crucial in the competition to getting a good job. At many news organizations, an applicant who has these skills—which are far more than the ability to surf the Web—gets his or her résumé moved to the top of the stack.
A journalist does not have to be a programmer or someone who knows software code, although that also can make a huge difference. A journalist who can use a spreadsheet or database manager is free to thoroughly explore information, reexamine it, and reconsider what it means in relation to interviews and observations in the field. The journalist can take the spin off the information and get closer to the truth. A journalist may not be a statistician, but a good journalist knows enough about statistics to know how easy it is to manipulate them or lie with them. In the same way, if a journalist understands how data can be manipulated, he or she can better judge a bureaucrat’s spin on the facts or a government’s misuse of a database.
keep calm2Journalists have found, too, that if they let a person whose job is only to process data do the analysis, nuances or potential pitfalls of the data may be missed. A data programmer also does not necessarily think like a journalist; what may be significant for the journalist may seem unimportant to the programmer. Using a data programmer to do all the work is like asking someone else to read a book for you.
The conscientious journalist also does not want to fall into a cycle of asking for a report in some frozen digital format, studying the report, coming up with more questions, and then asking for another report. Why get into a lengthy back-and-forth when you can engage in a rapid, multidimensional conversation with the data on your computer screen?
Most important, computer-assisted reporting is at the heart of public service journalism and of vigilant daily reporting. This is true whether writing about education, business, government, environmental issues, or any other topic.

This excerpt is from the just-released fourth edition of Computer-Assisted Reporting: A Practical Guide by Brant Houston, and is reprinted by permission of Routledge.
houston headshotBrant Houston (@branthouston) is the Knight Chair in Investigative Reporting at the University of Illinois at Urbana-Champaign. He is board chair of the Global Investigative Journalism Network and oversees the community news project CU-CitizenAccess.org. From 1997 to 2007, he served as executive director of IRE. He is author of the newly revised Computer-Assisted Reporting.


Tipsheets & Links

Session materials from panels, demos and hands-on classes at the 2015 CAR Conference. Were you a NICAR15 speaker with materials we don't yet have on this list? Send them to tipsheets@ire.org.

WEDNESDAY

Secrets of covering money | 1:30 p.m. in International 8-9
Techraking <=10: bootcamping the news | 1:00 p.m. in International 10

THURSDAY

Welcome and overview of the conference | 8:30 a.m. in International 8-9
No tipsheet available 
Key data for investigating universities | 9:00 a.m. in International 2-3
Investigating caregivers | 9:00 a.m. in International 4-5
Spotlight: Being a reporter when everyone’s a journalist and there’s data everywhere | 9:00 a.m. in International 6-7
No tipsheet available 
Getting started: Intro to CAR and the conference | 9:00 a.m. International 8-9
No tipsheet available 
From words to pictures: Text analysis and visualization | 9:00 a.m. International 10
Interactive data graphics in Tableau Public | 9:00 a.m. in M101
No tipsheet available 
Intermediate /advanced Python | M102
Getting started with open-source database manager MySQL | 9:00 a.m. in M103
No tipsheet available 
PyCAR – Team A | M104
No tipsheet available 
Excel magic | 9:00 a.m. in M105
PyCAR – Team B | M106
No tipsheet available 
Twitter tricks and analytics | 9:00 a.m. in M107
Map Camp | M109
No tipsheet available 
The forgotten history of data journalism | 10:10 a.m. in International 2-3
No tipsheet available 
Tracking diseases | 10:10 a.m. in International 4-5
Do it once (and only once) | 10:10 a.m. in International 6-7
What the hell is R and all the other questions you’re afraid to ask | 10:10 a.m. in International 8-9
Interactives without programming: A new tool for building data-driven, interactive content | 10:10 a.m. in International 10
No tipsheet available 
Advanced design and interaction in Tableau Public | 10:10 a.m. in M101
No tipsheet available 
Excel for business and economics | 10:10 a.m. in M103
No tipsheet available 
Make Photoshop work for you | 10:10 a.m. in M105
Designing database applications to increase page views and ad revenues | 10:10 a.m. in M107
No tipsheet available 
Uncovering racial and economic divides using data | 11:20 a.m. in International 2-3
Data from scratch: When data don’t exist | 11:20 a.m. in International 4-5
Digging deeper with the web | 11:20 a.m. in International 6-7
No tipsheet available 
Spotlight: A conversation with Hanna Wallach – lessons from computational social sciences | 11:20 a.m. in International 8-9
No tipsheet available 
Tell stories about your community with real estate data | 11:20 a.m. International 10
Intermediate / advanced Python | 11:20 a.m. in M102
OpenRefine | 11:20 a.m. in M103
No tipsheet available
Getting started with Access | 11:20 a.m. in M105
No tipsheet available
First steps with R | 11:20 a.m. in M107
CAR on the beat | 2:10 p.m. in International 2-3
Make every international story a data story | 2:10 p.m. in International 4-5
Space journalism: Using satellite imagery for data projects | 2:10 p.m. in International 6-7
The latest on open records | 2:10 p.m. in International 8-9
No tipsheet available
Plotly: Making & sharing beautiful, engaging graphs | 2:10 p.m. in International 10
No tipsheet available
Intro to data stories in Tableau Public | 2:10 p.m. in M101
No tipsheet available
Stats: An introduction | 2:10 p.m. in M103
No tipsheet available
Counting and summing with Access | 2:10 p.m. in M105
Grabbing data from websites without scrapping | 2:10 p.m. in M107
Analyzing jobs in your community | 3:20 p.m. in International 2-3
I screwed up. I survived and you can, too | 3:20 p.m. in International 4-5
No tipsheet available
Humanizing numbers | 3:20 p.m. in International 6-7
Data journalism in the university: Making the paradigm shift | 3:20 p.m. in International 8-9
No tipsheet available
Bedfellows: Explore the relationships between PAC donors and recipients | 3:20 p.m. in International 10
No tipsheet available
Advanced calculations and analysis in Tableau Public | 3:20 p.m. in M101
No tipsheet available
Stats: Basic linear regression | 3:20 p.m. in M103
No tipsheet available
Joining tables with Access | 3:20 p.m. in M105
No tipsheet available
Import.io: Web scrapping without coding | 3:20 p.m. in M107
Map Camp | 3:20 p.m. in M109
No tipsheet available
Tell me a story I won’t forget | 4:30 p.m. in International 2-3
Teaching data journalism: Your best ideas | 4:30 p.m. in International 4-5
No tipsheet available
Investigating business with data | 4:30 p.m. in International 6-7
Broadcast: Viz, quick hits and the data you need | 4:30 p.m. in International 8-9
Let’s find the open and unsecured | 4:30 p.m. in International 10
Stats: Logistic regression | 4:30 p.m. in M103
No tipsheet available
Make the command line work for you | 4:30 p.m. in M105
No tipsheet available
Building better maps with Leaflet, Mapbox and JavaScript | 4:30 p.m. in M107

FRIDAY

The new ecosystem of health care data | 9:00 a.m. in International 2-3
VR for the NICAR world: Immersive data | 9:00 a.m. in International 4-5
No tipsheet available
The year in CAR | 9:00 a.m. in International 6-7
Bridging the developer / journalist gap | 9:00 a.m. in International 8-9
No tipsheet available
Moonshining: Indexes pack high-proof explanatory power | 9:00 a.m. in International 10
GitHub 101: The basics | 9:00 a.m. in M101
No tipsheet available
Build your first news app | M102
Getting started in Excel | 9:00 a.m. in M103
No tipsheet available
Map like a pro with ArcGIS Online | 9:00 a.m. in M104
No tipsheet available
Regular expressions for beginners | 9:00 a.m. in M105
No tipsheet available
Fusion Tables for beginners | 9:00 a.m. in M106
No tipsheet available
Useful command line tools for reporters | 9:00 a.m. in M107
Amazon Cloud basics | 9:00 a.m. in M109
No tipsheet available
Visual journalism for tiny news desks | 10:10 a.m. in International 2-3
Thinking about interactivity | 10:10 a.m. in International 4-5
No tipsheet available
Watchdogging public spending | 10:10 a.m. in International 6-7
Red alert: Tools to automatically generate story leads | 10:10 a.m. in International 8-9
Introducing Geomancer: Don’t let your data be lonely tonight | 10:10 a.m. in International 10
No tipsheet available
How I learned to take command of the command line: A journalist’s guide to getting started | 10:10 a.m. in M101
Using formulas in Excel | 10:10 a.m. in M103
No tipsheet available
Making timelines | 10:10 a.m. in M104
Data alchemy | 10:10 a.m. in M105
No tipsheet available
Getting started with Python | 10:10 a.m. in M107
Crowdsourcing with Google Forms | 10:10 a.m. in M109
No tipsheet available
Color (and shape and place) my world | 11:20 a.m. in International 2-3
Local data that can lead to stories | 11:20 a.m. in International 4-5
Machine learning in the wild - #wins and #fails | 11:20 a.m. in International 6-7
Jobs and career straight-talk: For (and by) young’uns only | 11:20 a.m. in International 8-9
No tipsheet available
Demo: Tools for cracking PDFs | 11:20 a.m. in International 10
Web scraping using Python | 11:20 a.m. in M101
No tipsheet available
PivotTables in Excel | 11:20 a.m. in M103
No tipsheet available
Introduction to web programming | 11:20 a.m. in M104
No tipsheet available
Simple stats in Excel | 11:20 a.m. in M105
Visualizing your data with R | 11:20 a.m. in M106
Using data to detect environmental dangers | 2:10 p.m. in International 2-3
Critical questions to ask of studies, press releases and scientific reports | 2:10 p.m. in International 4-5
Stat-a-thon | 2:10 p.m. in International 6-7
No tipsheet available
Life after Excel and Access – the next steps | 2:10 p.m. in International 8-9
No tipsheet available
Scraping, APIs and data extraction | 2:10 p.m. in International 10
No tipsheet available
Advanced Django for data analysis | 2:10 p.m. in M101
No tipsheet available
Teach yourself to code | 2:10 p.m. in M102
Mini Boot Camp – Excel | 2:10 p.m. in M103
No tipsheet available
Mapbox | 2:10 p.m. in M104
Mini Boot Camp – Excel | M105
No tipsheet available
Getting data into Excel | 2:10 p.m. in M106
Intro to D3 | 2:10 p.m. in M107
Getting started with JavaScript | 2:10 p.m. in M109
Deep dives part 2 | 3:20 p.m. in International 2-3
No tipsheet available
Data behind the news | 3:20 p.m. in International 4-5
No tipsheet available
Free tools | 3:20 p.m. in International 6-7
No tipsheet available
Spotlight: Using abstraction to gain knowledge from numbers | 3:20 p.m. in International 8-9
Risk adjustment basics | 3:20 p.m. in International 10
No tipsheet available
GitHub 201: Leveling up with the command line | 3:20 p.m. in M102
No tipsheet available
Analyzing networks with Gephi | 3:20 p.m. in M106
Playing with Arduino | 3:20 p.m. in M107
Mapping JS: Building narrative with geo data + CartoDB | 3:20 p.m. in M109

SATURDAY

This just in: Data for breaking news | 9 a.m. in International 2-3
Processes, standards and documentation for data-driven projects | 9 a.m. in International 4-5
Design/Viz: What to do, and what not to do | 9 a.m. in International 6-7
No tipsheet available
Getting it the rightest you can | 9 a.m. in International 8-9
Defense against the dark arts: Security for you and your sources | 9 a.m. in International 10
No tipsheet available
Ruby 1: Introduction | 9 a.m. in M101
Advanced Fusion Tables | 9 a.m. in M105
No tipsheet available
No tipsheet available
Do more in R with these useful new packages | 9 a.m. in M106
No tipsheet available
Create your own interactive newsgame without coding | 9 a.m. in M107
Sensor journalism: Buzz or BS? | 10:10 am in International 2-3
    How to build a happy team | 10:10 a.m. in International 4-5
    No tipsheet available
    (Keep) following the money on state and local politics | 10:10 a.m. in International 6-7
    No tipsheet available
    Social media sleuthing | 10:10 a.m. in International 8-9
    No tipsheet available
    Building mobile-ready visualizations and maps in minutes with Silk | 10:10 a.m. in International 10
    Ruby 2: Acquiring and transforming data | 10:10 a.m. in M101
    No tipsheet available
    Text editing with Regular Expressions | 10:10 a.m. in M103
    No tipsheet available
    Advanced Python for data analysis: Part 1 | 10:10 a.m. in M104
    No tipsheet available
    Advanced SQL for analysis | 10:10 a.m. in M 105
    R: Preplication | 10:10 a.m. in M106
    Municipal bonds 101 | 10:10 a.m. in M107
    Tools for cracking PDFs | 10:10 a.m. in M109
    Deep dive: Philip Meyer Award winners | 11:20 a.m. in International 2-3
    No tipsheet available
    Go home and share your work! | 11:20 a.m. in International 4-5
    No tipsheet available
    Making data-informed design decisions | 11:20 a.m. in International 6-7
    Policing the police | 11:20 a.m. in International 8-9
    Intermediate/Advanced Security | 11:20 a.m. in International 10
    Ruby 3: Simple web apps with Ruby | 11:20 a.m. in M101
    Advanced Python for data analysis: Part 2 | 11:20 a.m. in M104
    No tipsheet available
    Under pressure: Real life in real time with breaking news | 11:20 a.m. in M106
    Machine learning | 11:20 a.m. in M107
    No tipsheet available
    So they bought a federal candidate, now what? | 2:10 p.m. in International 2-3
    Catching fire: Spreading data journalism throughout the newsroom | 2:10 p.m. in International 4-5
    Spotlight: Talking about uncertainty | 2:10 p.m. in International 6-7
    No tipsheet available
    Data negotiation: To FOIA or not to FOIA | 2:10 p.m. in International 8-9
    How to draw the Internet: Better interactives through paper prototyping | 2:10 p.m. in International 10
    No tipsheet available
    Introduction to mapping: Importing and displaying data geographically with QGIS | 2:10 p.m. im M102
    Getting started with Python | 2:10 p.m. in M104
    No tipsheet available
    Mini Boot Camp – Access
    No tipsheet available
    Automate your development life: Build and deploy with Yeoman and Grunt | 2:10 p.m. in M106
    No tipsheet available
    Twitter Bootstrap: Responsive website framework | 2:10 p.m. in M107
    Getting started with SQLite | 2:10 p.m. in M109
    No tipsheet available
    Getting started with machine learning | 3:20 p.m. in International 2-3
    Data smells | 3:20 p.m. in International 4-5
    No tipsheet available
    50 ideas in 60 minutes | 3:20 p.m. in International 6-7
    Flying solo: When your data “team” is just you | 3:20 p.m. in International 8-9
    Visualization for reporting | 3:20 p.m. in International 10
    Mapping 2: Manipulating and editing geographic data with QGIS | 3:20 p.m. in M102
    Python 2 | 3:20 p.m. in M104
    No tipsheet available
    Reporting and presentation with DocumentCloud | 3:20 p.m. in M106
    No tipsheet available
    Summing and grouping in SQLite | 3:20 p.m. in M109
    No tipsheet available
    Editing the data story | 4:30 p.m. in International 2-3
    Reporting out the data story | 4:30 p.m. in International 4-5
    No tipsheet available
    Mise en Place: What a restaurant kitchen can teach us about deadline coding | 4:30 p.m. in International 6-7
    No tipsheet available
    How not to make a fool of yourself with statistics | 4:30 p.m. in International 8-9
    No tipsheet available
    On repeat: How to use loops to explain anything | 4:30 p.m. in International 10
    No tipsheet available
    Web Inspector for complex scrapes | 4:30 p.m. in M102
    Python 3 | 4:30 p.m. in M104
    No tipsheet available
    Advanced Python | 4:30 p.m. in M107
    Joining and advanced operations in SQLite | 4:30 p.m. in M109
    No tipsheet available

    SUNDAY

    The best of broadcast investigations 2014 | 9 a.m. in International 6-7
    No tipsheet available
    Game of your life: Data from day one to the day you die | 9 a.m. in International 8-9
    No tipsheet available
    Shedding light on the dark web: Using tech to build corporate investigations | 9 a.m. in International 10
    No tipsheet available
    Lightning fast data analysis with Tableau: Part 1 | 9 a.m. in M101
    No tipsheet available
    Just enough Django: Distributed data entry in the newsroom | M102
    Excel en Espanol | M103
    No tipsheet available
    Advanced SQL using PostgreSQL | 9 a.m. in M104
    Mini Boot Camp | M105
    No tipsheet available
    Deep dives | 10:10 am in International 6-7
    No tipsheet available
    Finding data trails | 10:10 a.m. in International 8-9
    No tipsheet available
    Using machine learning to deal with dirty data: A Dedupe demonstration | 10:10 a.m. in International 10
    No tipsheet available
    Lighting fast data analysis with Tableau: Part 2 | 10:10 a.m. in M101
    No tipsheet available
    Making your own Yo bot | 10:10 a.m. in M104
    No tipsheet available
    Using Silk.co to analyze data and publish beautiful maps and charts in minutes | 10:10 a.m. in M106
    No tipsheet available
    Career roundtable | 11:20 a.m. in International 6-7
    No tipsheet available
    Mining searchable databases for stories | 11:20 a.m. in International 8-9
    No tipsheet available
    Advanced DocumentCloud: Examples and suggestions | 11:20 a.m. in International 10
    No tipsheet available
    CAR wash | 11:20 a.m. in M104
    No tipsheet available
    Take home a text editor | 11:20 a.m. in M109
    No tipsheet available

    No comments:

    Post a Comment