Every time we use an app to order a meal, drive a connected car, log our bike ride, or buy a coffee, we generate data. As consumers, we accept surrendering a bit of personal data in exchange for the conveniences our connected world provides. But what if this data, handled in a privacy-principled and ethical manner, could also improve our lives as citizens in the public commons by equipping policymakers to make better-informed decisions? What if, through composite datasets—for example, on mobility, economic activity, land use, traffic safety, and demographics—we knew how, when, and why people moved around their community, so that we could design more impactful infrastructure, such as multimodal streets that reduce traffic deaths or vehicle miles traveled to lessen carbon emissions?
Big data (large data sets that can be analyzed mathematically to reveal patterns and trends) and machine learning (computational systems that learn and adapt by using algorithms and statistical models to analyze and draw inferences from data patterns) today give data users the power to process and make huge volumes of information insightful and actionable, thereby enabling data-informed decisionmaking. This can complement traditional methods of data collection, such as household surveys, which can be time-consuming and costly and have varying participation rates. By contrast, big data, as noted by the U.S. Census, are “collected passively as the digital exhausts of personal and commercial activities.” This article presents two areas in which measurable impact is possible through data-informed decisionmaking: first, supply-chain management and how it impacts communities, and second, traffic safety analysis—while also making the case for building additional public sector data literacy and capacity.
Data-Informed Supply-Chain Management and Community Impacts
Last year, the White House announced a new initiative to improve access to key data about supply chains in order to solve disruptions and congestion that have been prevalent from the early days of the COVID-19 pandemic. With the Freight Logistics Optimization Works, the Joe Biden—Kamala Harris administration is pursuing the creation of an information exchange that can give stakeholders better data about what is happening in supply chains.
There is power in having comprehensive sources of data at the center of conversations about building more resilient and adaptable supply chains. For example, the California Transportation Commission and the U.S. Army Corps of Engineers recently leveraged large mobility, economic activity, and land-use datasets from Replica. (Author’s note: I currently serve as Replica’s chief legal officer.) The data were used to help California policymakers better understand supply chains, the movement of goods, and the state of congestion at California’s ports.
The challenge is that supply chains are incredibly complex, and so are the myriad ways they interact with infrastructure, mobility patterns, and economic activity throughout the built environment. This complexity is pronounced at major ports, especially when there is a disruption that causes shipping congestion. But there are no fast, reliable, and comprehensive sources of data that everyone can utilize to get the picture of what is going on.
A dashboard that brings together big data about mobility and consumer spending activity with publicly available data about harbor vessel traffic and queuing data (for example, from the Marine Exchange of Southern California), air quality data (for example, from the Clean Air Action Plan), and other relevant data sources can lead to a better understanding of the relationship between port congestion and trips and local economic activity.
By tracking such metrics, supply-chain leaders can assess the impacts of operational changes and policy interventions at the subnational level to determine if they were effective and, if so, whether they should be replicated at other ports. It is also possible to see how the data compare with the previous year’s, when conditions were different.
In addition to tracking metrics, big data can help illustrate primary factors for port congestion and the second-order impacts it has on neighboring places—many of them low-income communities of color.
These impacts include air pollution, traffic safety, truck idling times, on-street traffic, and consumer goods shortages. By getting a more complete view of offshore and onshore impacts, policymakers can design more targeted and effective interventions and assess their impacts. For example, how effective are ship queuing systems in reducing port congestion and increasing the potential for equitable health outcomes in neighboring communities?
Privacy-principled big data can improve policymakers’ understanding of the impact that infrastructure investments can have on supply-chain disruptions and port congestion. Further, comprehensive, disaggregated mobility and socioeconomic data can help address urgent needs around inequities, a key focus of the federal government’s Justice40 Initiative. The common operating picture produced can help take the element of guesswork out of infrastructure planning while targeting investments to build more equitable and resilient communities.
Understanding High-Conflict Traffic Corridors to Increase Public Safety
According to the National Highway Traffic Safety Administration, 42,795 people died on American roadways in 2022, which was a nearly 30 percent increase over ten years. The Department of Transportation has issued policies, such as the National Roadway Safety Strategy, to reduce such deaths to zero. Big data make it possible to map driver events and active mode volumes to create a better and more efficient framework for corridor prioritization.
Local governments are looking at unconventional strategies to prioritize safety interventions to prevent crashes and protect vulnerable road users. For example, Culver City in California wanted to better understand the locations and characteristics of its high-conflict corridors to help prioritize safety interventions along city roadways. The city government had access to only a limited amount of crash data and no exposure data. Further, it had a limited understanding of sociodemographic, time-of-day, or active mode volumes for trip-takers within the city limits. Culver City worked with Replica and Arity to map driver events and active mode volumes to create a better and more efficient framework for corridor prioritization.
High-conflict corridors can be identified with any combination of the following data:
- pedestrian and bike traffic volumes, including for specific age groups such as seniors and children;
- specific driver event data, such as on rapid acceleration, hard braking, speeding, collisions, and phone handling;
- driver speeds above certain speed limits;
- weekday or weekend traffic volumes; and
- corridor length.
Further, disaggregated data provide insights on specific combinations of vulnerable users, driver events, and corridor types. Armed with this data, local governments can identify the corridors with the highest concentration of conflicts and the hot spots for targeted policy interventions.
Building Public Sector Data Literacy and Capacity
Equipping policymakers with such data can also help support more nuanced conversations about infrastructure interventions that improve the public health, welfare, and safety of communities. As data literacy becomes increasingly regarded as a core workforce competency, there are questions to consider as public bodies look to build additional capacity:
- What is the role of disaggregating data in mitigating bias in public decisionmaking? For example, how can policymakers access more targeted insight by slicing and viewing data according to different variables such as age, socioeconomic status, gender, race, or location? Once data are taken apart and filtered for different variables, policymakers can tell a more refined story.
- How do traditional data-collection methods—such as household surveys, which have a lag from collection to action—compare to big data capabilities to get a more robust picture of what is happening in the built environment?
- How can policymakers ensure meaningful privacy regulations are developed and enforced to ensure democratic norms are upheld?
This era of big data and machine learning is opening up new frontiers for policymakers, including when it comes to what baseline questions they can ask (for example, how many people from low-income communities of color use public transit versus cars to reach jobs in central business districts) and what they can do to better measure outcomes from policy interventions (for example, what will be the impact on ridership if public transit becomes fare free). As capacity-constrained subnational governments continue to do more with less while simultaneously addressing multiple crises—such as a global pandemic, a housing crisis, and climate-induced wildfires—this is the time to learn about and embrace what is possible through data-informed decisionmaking.