Governments, social media platforms, and others working to counter influence operations rely heavily on investigations to inform their understanding of the problem, direct enforcement actions, and guide policy change. Yet investigators themselves have long worried about the lack of agreed-upon, widely disseminated best practices for how this research should be done. Most participants in a 2022 Carnegie Endowment for International Peace survey of counter influence operations professionals from around the world in academia, civil society, government, media, and technology said they lacked access to best practice guidance in areas such as data collection, privacy protection, and attribution. The dearth of best practice guidance makes it harder for audiences (such as policymakers, journalists, and funders) to judge investigative quality, for new researchers to join the field and address understudied areas, and for practices to be debated and refined over time.
In an effort to document some existing best practices and identify areas of continued ambiguity and debate, Carnegie’s Partnership for Countering Influence Operations (PCIO) convened a group of high-level investigators—those whose work may come closest to representing “best practice” today. The June 2022 workshop convened about thirty leading figures from the independent investigative community, social media integrity and policy teams, and representatives from government agencies.1
The discussions revealed broad agreement on general principles but more variation in how these principles are implemented. For example, participants agreed that investigators should focus on influence operations with the greatest potential for harm, but they had varying methods of assessing harm. Participants agreed on the factors that underlie attribution and what kinds of evidence are most conclusive, but they had differing views on how to evaluate and when to publicize more ambiguous cases. Participants agreed that clear and transparent communication is essential, but they did not always share the same terminology or style. Participants agreed on the value of peer reviews and what they should generally cover, but they had a range of perspectives on when such reviews are necessary, who should carry them out, and how formal they should be.
The workshop also highlighted that even when best practices are known, they can be difficult to follow. Participants mentioned scarcity of time, funding, and information as barriers to their ideal means of pursuing cooperation with other investigators, choosing the most important leads, and producing quality analysis. The need to secure funding, draw attention to one’s work, and prevent harm from ongoing operations are all obstacles to producing slow but measured analysis. Some of the most useful information for determining an operation’s impact or the identity of its perpetrator is difficult for independent investigators to obtain on their own, which makes attribution challenging. Participants also explored how shifts in the conduct of influence operations have changed the type of evidence analysts find reliable. Time pressure problematizes peer review, for which there are many models, all of which involve trade-offs.
Several major conclusions emerged from these discussions.
First, despite widespread belief that best practices do not exist, the most respected investigators do agree on many key principles. While they do not always implement these principles in a uniform way, top investigators have collectively developed a menu of approaches that peers generally see as sound. This suggests that the apparent dearth of best practices is in part a problem of documentation and dissemination—a potentially solvable problem. Solutions should focus on ensuring that any best practice guidance is reasonably accessible and practically relevant to a diverse, global pool of investigators with varying capabilities, research interests, and operating challenges.2
Second, debates over what constitutes best practice are often about how best to implement a set of generally shared principles under challenging and varied circumstances. While some of these implementation debates may be possible and important to resolve, others may be irresolvable (with current knowledge) or reflect a range of reasonable judgments based on investigators’ differing environments and goals. For example, the evidentiary value of IP addresses and time zone patterns—something that investigators debated in the workshop—may vary from case to case and evolve over time.
Finally, the fact that some investigators use problematic methods does not always indicate a true lack of agreement on what constitutes best practice; often, it stems from operational constraints. Even if best practice guidance were widely available, investigators would still have limited time, resources, and expertise, and they would sometimes face perverse incentives. The workshop discussions revealed that even top-tier investigators struggle with these issues. As the community continues to explore best practices, investigators should consider defining multiple tiers of quality—from minimally acceptable to ideal—that take into account real-world trade-offs.
What Tools and Standards Are Available to Prioritize Investigations?
The first area of best practice explored in this workshop was how investigators should prioritize among multiple leads competing for their attention. Workshop participants suggested several factors they often see investigators use for this purpose. Many were proxies for the potential impact of an operation: risk of offline harm or violence, reach or spread of the operation’s messaging, or likelihood that the operation will deceive large numbers of people in ways that negatively impact concerns such as election integrity or public health. Participants largely agreed on basic principles in this area—such as minimizing risk of harm—but their ability to follow this principle can rely on imperfect metrics.
Of these, “reach” was the trickiest factor to define and agree on. Some workshop participants said operations that are trending or viral deserve priority over those that are not. Others said analysts should consider the effect of an operation on its target audience: operations can still be impactful if they target a narrow, but important, group rather than the population at large. Such a group might be especially vulnerable or politically significant, and messages that reach it can be impactful without being far-reaching.
During the workshop, participants mentioned Ben Nimmo’s “Breakout Scale” as one tool for thinking about the size and impact of an operation on public discourse. The scale suggests that operations that spread to multiple social media platforms, several communities, or offline media are more serious than those that do not. However, many operations today are cross-platform from their onset, and operations with a narrow but important target audience can be impactful without breaking out into new communities.
If the Breakout Scale is correct and cross-platform operations are more impactful, then they are especially important subjects for investigators. Unfortunately, workshop participants said that the lack of tools makes investigating cross-platform operations difficult. They also commented on the difficulty of tracking operations in closed online communities or on private messaging applications, spaces they have difficulty accessing and where privacy laws and ethical standards limit available approaches.
Participants also cited the novelty of an operation’s message or techniques as a potential reason to prioritize it for investigation. It might be more alarming, for example, if an operation gets a new narrative to stick in the public consciousness, because it would suggest the operation’s influence goes beyond amplifying preexisting viewpoints to independently and directly guiding political discourse. However, other participants cautioned against prioritizing based on novelty alone: they felt a new message or operational technique only merited increased priority if it seemed especially harmful or told the investigative community something of value about how operations might be changing.
Time is a serious challenge for prioritization. Many participants noted the trade-off between speed and thoroughness; timely investigations might prevent harm, but slipshod ones can cause it by misinforming the public and policymakers or silencing authentic speech online. Investigators do not always have access to the most useful information on the reach, scale, or likely impact of an operation during the early stages of their work. It may not be until the middle of an investigation that analysts have enough information to weigh priorities or the confidence to share evidence with other stakeholders. Because social media platforms have unique access to certain signals (for example, user reports and number of impressions), they could help ameliorate this by supplying additional data to analysts who provide platform staff with early leads—though it is unclear if all platforms have the staff capacity to do this.
Worryingly, some participants noted that the imperative to attract attention from policymakers, media, and especially funders can interfere in setting priorities. While publicity and the attention of key audiences is useful and important, the risk does exist that analysts may pursue them by making overinflated claims. Not every journalist or policymaker has an accurate sense of which threats are likely to have the highest impact, which can warp priorities for investigators if they feel pressure to pursue certain topics even though they suspect others are more crucial.3 When these shortcomings manifest, it is detrimental to the field, and participants agreed that the likelihood an investigation will attract media attention is a poor factor for prioritization.
When Is It Appropriate to Call Something an Influence Operation and Share Information with Other Investigators?
As with prioritizing investigations, workshop participants said the decision to privately share information on a suspected operation with other investigators, industry professionals, or government representatives can be guided by the risk of harm. Emergency or high-risk situations can justify flagging potential operations even if investigators have less confidence in their conclusions about the nature of the operation or the identity of its perpetrator (though an investigator’s confidence level in their assessment should always be articulated). Information sharing within the field can be part of a virtuous cycle in which investigators who routinely share thoughtful analysis can gain the trust of their community and may receive additional leads from others. But investigators must navigate how, when, and in what matter to do so.
Participants cited certain types of information that investigators should include when making a responsible disclosure: the investigator’s definition of influence operations, evidence that the activity under investigation fits that definition, and the investigator’s clearly communicated confidence level for any claims. This last point is especially important when findings are made public; speculative reports can be useful leads for others but should be prominently labeled as such.
Trade-offs come into play when considering whether to disclose an investigation publicly or privately. Done responsibly, public disclosure might generate societal resilience and momentum for policy change—or at least, that is the hope. However, there is also fear that public disclosure might risk disproportionate alarm and even aid “perception hacking,” or overblown public concern about an operation. Limited, private disclosure to trusted partners in industry or government can facilitate takedowns and other actions without some of the risks of public disclosure.
Public disclosure comes with its own difficulties. Participants noted that competition between investigators for funding and attention can make them less likely to share specific leads and details with each other. Some participants suggested that independent investigators lack a consistent format for reporting their findings to government and industry stakeholders, which makes it more difficult for third parties to quickly assess their work and compare trends across influence operations. They pointed to the DISARM framework as a potential solution based on approaches used by cybersecurity professionals.
At least one participant noted that industry investigators have invested in mechanisms to rapidly verify and disseminate information amongst their internal teams and peer companies. External investigators do not have this advantage. In the future, the community might consider means to extend the rapid information-sharing processes used by industry professionals to external partners.4
How Valuable Are Common Signals Used to Attribute an Operation to an Actor?
Attributing an influence operation to a specific actor is a difficult, high-risk decision for investigators. Workshop participants repeatedly stressed that investigators should be transparent about their confidence in attributions they do make, use nuanced language in their analysis, and bear in mind any competing hypotheses. Independent investigators said they have some room to publicly offer speculative, lower-confidence attributions in the course of describing evidence from their investigation, provided the confidence level is clearly stated. However, representatives from industry and government said they assume greater legal and political risk when making attributions and typically only do so when highly confident.
This workshop session asked participants to name factors they consider when attributing an influence operation to a specific actor, and to describe that factor as low-, medium-, or high-value evidence for making an attribution. Participants in this session often agreed, but there were instances of disagreement or ambiguity—suggesting that even among the most respected investigators, attribution is not a perfected science.
The results also come with a few caveats. First, there may not be a one-to-one correlation between these tiers of evidence and specific levels of confidence: the presence of a single high-value factor does not guarantee a high-confidence attribution.5 Second, the weight of any particular kind of evidence may vary from case to case. Third, the totality of evidence must be weighed before reaching a judgment.
The high-value factors mentioned during the discussions were often forms of evidence obtained offline and used to infer an actor’s identity; acquiring this evidence might even require infiltration of a threat actor’s planning process. There was general agreement on the strength of these factors, which included:
- Investigative reporting or intelligence revealing offline activity that can credibly link actors to one another—for example, financial documents or other records
- Evidence identifying the moderators of specific Facebook groups or other platform features (for example, Facebook pages and events or YouTube channels) who have known affiliations with a threat actor
- Shared infrastructure like cryptocurrency wallets, Amazon storefronts, or advertising accounts
- Evidence from private chats where operations are planned
- Open-source intelligence allowing investigators to geolocate an actor
Medium-value factors are more complicated to assess. Often, they might suggest but do not definitively prove the identity of an actor, or their utility is affected by other evidence or context. Participants sometimes disagreed on the reliability of these indicators, which in some cases has changed with time and as tactics have evolved. For some factors, participants cautioned against relying too heavily on them. Medium-value factors included:
- “Shift patterns,” when posts are made within business hours in a certain time zone, which some participants noted are of decreasing value as operations become better at hiding their origin through outsourcing and other methods
- Well-defined playbooks used repeatedly and without adaptation by a specific actor, though many actors draw on similar playbooks because similarities in platform features limit the range of techniques available
- Evidence of an operation’s overall sophistication, as more sophisticated operations may require more resources to run
- Technical signals like IP addresses and domain name registrations, especially when they belong to lower-volume web servers known to belong to a specific actor, but not when they belong to a high-volume server such as those provided by commercial web hosts6
Low-value factors tell investigators relatively little about the identity of an actor. When a participant would mention them for discussion (not necessarily to offer a strong endorsement), others often raised serious concerns about the unreliability of these factors. These low-value factors include:
- Alignment between specific content and a known actor’s position or objective, which might also be shared with other actors or real users
- Specific language patterns, for example, if a network of accounts makes similar grammatical errors or uses key phrases distinctly tied to state media—things that can be purposely spoofed, may reflect organic user activity, or may be common to multiple actors
When asked if frequently used forms of network analysis were useful for making attributions, industry participants were skeptical.7 They said that network graphs of clustered accounts are often produced by commercially available software without proper context defining the connections between nodes, making it difficult to say whether they show coordination or anything else of significance. Participants cautioned against uncritical acceptance of attribution claims based on this form of analysis.
Some participants suggested signs of inauthenticity (that an actor is not who they claim to be) or coordination (that a network of actors is operating with a shared objective) as factors for attribution. Others said that while these might be suspicious signs that an operation is underway, they do not suggest much, if anything, about the specific identity of the actor behind it. As such, they are not useful attribution signals.
Participants discussed two other general points. First, a combination of lower-strength signals is not always greater than the sum of its parts. Rather than trying to add together speculative evidence, in the absence of high confidence the best course of action for investigators may be to pass leads along to trusted partners for further analysis. Second, while attribution can be a powerful tool for drawing attention to an operation, many participants noted that it is not required for platforms to remove accounts or content which are violative for other reasons (for example, if the account is demonstrably operating under a false identity and the true identity of the actor behind it remains uncertain). In this sense, some participants believed that the importance of attribution is sometimes overemphasized.
What Considerations Go into Establishing Peer Review Processes?
Workshop participants also discussed how different forms of peer review can help refine investigators’ work. First, some suggested that not every analysis needs to be peer reviewed and that the robustness of a review depends in part on the investigation’s target audience: it is less important to thoroughly review lower-confidence assessments shared privately as leads within the community than it is to review those which make bold claims with high confidence or are intended for policymakers or the press.
The discussion raised a few key questions that a review should assess. Each of these questions could merit their own longer discussion to define relevant research standards and best practices, but even a brief, informal review would be expected to flag major concerns related to these questions:
- Are the findings well-explained and reproduceable?
- Does the investigation draw on appropriate data sources in a transparent way?
- Are the assumptions guiding the research and its conclusions reasonable and transparently acknowledged?
- Does it express the author’s level of confidence in their assessment? Are there elements of uncertainty that should be explained?
- Does the investigator properly define key terms, like “manipulation?”
- Does the investigator approach questions of attribution responsibly?
- Does the investigator incorporate the appropriate country-specific context and expertise? Is the reviewer able to assess this?
Overall, the discussion suggested there is probably no single, best way to conduct a peer review. PCIO’s 2022 survey found that nearly three quarters of researchers report some form of internal peer review. This was especially true of participants from media and civil society. Participants in the workshop felt that while this approach does not provide the opportunity to draw on as diverse a range of perspectives as a wider community review, it is easier to consistently apply.
Most of the discussion focused on external peer reviews. Fewer—but still 57 percent—of survey respondents reported using external peer reviews. This was most common among academics and technology professionals. Civil society activists seem to draw on both internal and external peer reviews, while government employees were least likely to use either (perhaps, in part, for security reasons).
Both internal and external peer reviews can be more or less formal, and participants seemed to make use of varying formality levels depending on the circumstances. An informal peer review process in which one investigator asks a close peer or colleague for feedback might be quicker and easier to initiate than a formal mechanism with clearly defined rules, processes, and roles for different actors, all of which must be determined in advance. On the other hand, some participants believed that the process of developing a formal peer review mechanism might lead to productive conversations within teams or the wider community about standards and best practices (a potential benefit from best practice guidance suggested by past PCIO survey participants). They also suggested a formal review process could serve as a publicly visible signal of quality for external stakeholders trying to distinguish strong analysis from weaker work. However, others raised the possibility that defining a formal mechanism as the gold standard could unintentionally have a gatekeeping effect that limits the growth and inclusiveness of the investigative community.
Finally, at least one platform representative suggested that industry practices for reviewing investigations could serve as the basis of a more commonly used peer review process. This individual indicated that they already ask an internal set of routine questions about why external investigators believe a set of accounts is inauthentic, coordinated, harmful, or attributable to a specific actor; industry could collaborate to standardize this practice across companies and require it of external partners. This would have the benefit of requiring input from a smaller set of stakeholders to stand up—instead of the wider community, it would just borrow from integrity teams at relevant companies. But a more inclusive process for establishing a review mechanism would be more democratic and cede less control to an industry in which transparency remains a major concern.
Opportunities to Establish, Promote, and Improve Best Practice
Findings from PCIO’s survey and workshop indicate that there is a basic foundation for best practices for assessing online threats. Some published guidance exists, some researchers rely on internal guidance, and top-level investigators agree on many basic principles, such as prevention of harm and disclosure of confidence levels. But when asked to explore specific operational questions, the participants outlined a series of challenges, trade-offs, and judgment calls.
In some cases, best practice in these areas may simply help investigators make justifiable choices, even if those choices are not risk free. In other cases, it may be possible to streamline the decisionmaking process or remove some uncertainty from it by improving important metrics. For example, the difficulty of measuring the impact of influence operations is a recurring issue in previous PCIO research and came up in both the survey and the workshop. If investigators can refine best practices for assessing online threats, it could have significant ramifications for priorities across the entire field. Improvements on current measures, like Nimmo’s Breakout Scale, should be a priority for future exploration.
PCIO’s research in this area also identified possible opportunities for cross-sector learning and cooperation. For instance, while social media companies have invested in mechanisms for rapid information sharing among themselves, independent investigators lack the scale to set up similar infrastructure. This makes it more difficult for them to quickly follow up or receive feedback on early leads. During the workshop, some participants suggested that social media companies should explore ways to extend these processes to independent investigators. More broadly, respondents to PCIO’s 2022 survey who work in industry or government reported greater access to best practice guidance and that this guidance is mostly internal; more can be done to share these practices with other sectors.
PCIO’s survey indicated a clear desire for more best practice guidance across the community on a range of issues. While the abovementioned questions relating to standards and best practices that participants agreed were important were not all explored in depth during this workshop, the community could take up these priority questions in future work—for example, by starting with a review of the publicly available resources provided to PCIO by survey respondents, and then identifying where there are gaps to fill. More work should also be done to ascertain the precise ways in which existing guidance is inaccessible to novice researchers, such as language availability or technical sophistication. The discussion on attribution revealed a need for more community discussion to clarify approaches to this issue: in PCIO’s 2022 survey on best practice guidance, “collaboration and networking” was the second-most preferred means of disseminating best practices, while events were third. (Published reports and journals were first.)
Clearly, there is a great deal of work to do—both to promote agreed-upon best practices among up-and-coming investigators and to refine the operations of well-established efforts. But the field is not starting from scratch. A small but growing body of largely agreed upon knowledge does exist to guide influence operations investigators. While it may not eradicate operational obstacles, it can help investigators navigate them and, in some cases, suggest ways to reduce them. This knowledge should be expanded, refined, and put into the hands of more investigators.
Carnegie’s Partnership for Countering Influence Operations is grateful for funding provided by the William and Flora Hewlett Foundation, Craig Newmark Philanthropies, the John S. and James L. Knight Foundation, Microsoft, Facebook, Google, Twitter, and WhatsApp. The PCIO is wholly and solely responsible for the contents of its products, written or otherwise. We welcome conversations with new donors. All donations are subject to Carnegie’s donor policy review. We do not allow donors prior approval of drafts, influence on selection of project participants, or any influence over the findings and recommendations of work they may support.
Notes
1 This article is based on the results of a workshop conducted by Carnegie’s Partnership for Countering Influence Operations (PCIO) at the Atlantic Council’s Digital Forensic Research Lab’s (DFRLab’s) 360/Open Summit in June 2022. The workshop’s participants cycled through four discussion topics in small groups. Each topic was covered by a moderator who led the groups through fifteen-minute conversations. The moderators were Flora Carmichael, assistant editor at the BBC’s Trusted News Initiative, who led discussions on the tools and standards available to prioritize investigations; Yoel Roth, then head of trust and safety at Twitter, who led discussions asking what criteria should be met before assessing activity to be an influence operation; Dean Jackson, project manager for PCIO’s Influence Operations Researchers’ Guild, who led discussions on the value of common signals for attributing operations to an actor; and, finally, Patrice Wangen, a social data scientist at the European External Action Service, who led discussions on desirable standards for peer review. The workshop was designed with these moderators as well as DFRLab Nonresident Senior Fellow Ben Nimmo, who could not attend the session, and PCIO Director Alicia Wanless, who facilitated overall.
2 For more discussion on these challenges and trade-offs, see Victoria Smith and Jon Bateman, “Best Practice Guidance for Influence Operations Research: Survey Shows Needs and Challenges,” Carnegie Endowment for International Peace, August 2, 2022, https://carnegieendowment.org/2022/08/02/best-practice-guidance-for-influence-operations-research-survey-shows-needs-and-challenges-pub-87601.
3 For example, a series of roundtables convened by PCIO with investigators from different regions of the world raised concerns about the abundance of funding to investigate foreign operations, even though many investigators, especially in Latin America and Africa, felt domestic operations were a larger threat.
4 This was not the only area in which industry professionals had a reported advantage. In fact, PCIO’s 2022 survey found that industry and government professionals are more likely to have access to written best practice guidance than those in academia or civil society.
5 For more information on confidence intervals and their application in this field, consider: United Kingdom Government Communication Service, “Impact Analysis,” in RESIST 2 Counter Disinformation Toolkit, 2021, https://gcs.civilservice.gov.uk/publications/resist-2-counter-disinformation-toolkit/#Impact-analysis.
6 This was a point of some disagreement; a few participants offered IP address as a more solid indicator, but others were skeptical based on factors like the ability to mask IP addresses. IP addresses are a good example of a “medium-confidence” factor that requires additional context to usefully interpret.
7 For examples of this form of data analytics and how to interpret them, see: Nick Monaco and Daniel Arnaudo, “Data Analytics for Social Media Monitoring: Guidance on Social Media Monitoring and Analysis Techniques, Tools and Methodologies,” National Democratic Institute, May 2020, https://www.ndi.org/sites/default/files/NDI_Social Media Monitoring Guide ADJUSTED COVER.pdf.