Branching dynamics of viral information spreading

José Luis Iribarren and Esteban Moro

arbol1

Physical Review E 84, 046116 (2011)  [pdf]

Abstract
Despite its importance for rumors or innovations propagation, peer-to-peer collaboration, social networking, or marketing, the dynamics of information spreading is not well understood. Since the diffusion depends on the heterogeneous patterns of human behavior and is driven by the participants’ decisions, its propagation dynamics shows surprising properties not explained by traditional epidemic or contagion models. Here we present a detailed analysis of our study of real viral marketing campaigns where tracking the propagation of a controlled message allowed us to analyze the structure and dynamics of a diffusion graph involving over 31 000 individuals. We found that information spreading displays a non-Markovian branching dynamics that can be modeled by a two-step Bellman-Harris branching process that generalizes the static models known in the literature and incorporates the high variability of human behavior. It explains accurately all the features of information propagation under the “tipping point” and can be used for prediction and management of viral information spreading processes.

Affinity Paths and information diffusion in social networks

José Luis Iribarren and Esteban Moro arbol
Social Networks 33, 134-142 (2011)  [pdf]

Abstract
Widespread interest in the diffusion of information through social networks has produced a large number of Social Dynamics models. A majority of them use theoretical hypothesis to explain their diffusion mechanisms while the few empirically based ones average out their measures over many messages of different contents. Our empirical research tracking the step-by-step email propagation of an invariable viral marketing message delves into the content impact and has discovered new and striking features. The topology and dynamics of the propagation cascades display patterns not inherited from the email networks carrying the message. Their disconnected, low transitivity, tree-like cascades present positive correlation between their nodes probability to forward the message and the average number of neighbors they target and show increased participants’ involvement as the propagation paths length grows. Such patterns not described before, nor replicated by any of the existing models of information diffusion, can be explained if participants make their pass-along decisions based uniquely on local knowledge of their network neighbors affinity with the message content. We prove the plausibility of such mechanism through a stylized, agent-based model that replicates the Affinity Paths observed in real information diffusion cascades.

The speed and reach of forwarded emails, rumors, and hoaxes in electronic social networks

large_spain_5We have just published an experimental/theoretical work on the speed of information diffusion in social networks in Physical Review Letters. Specifically we have studied the impact of the heterogeneity of human activity in propagation of emails, rumors, hoaxes, etc. Tracking email marketing campaigns, executed by IBM Corporation in 11 European countries, we were able to compare their viral propagation with our theory (see below the campaigns details).

The results are very simple. Let me give you an example: the typical time between two emails sent by the same person is around 1 day. Traditional models of information diffusion will then yield to an infection speed of 1 day. However, some email computer viruses spread widely in a matter of hours (minutes, sometimes), while some viral propagation (for example the Veuve-Clicquot hoax) last for years. How can that occur? The reason is that traditional models are not correct because they neglect the large heterogeneity in the frequency of human activity: the average time between emails (1 day) does not actually represent the collectivity. In fact, most of us respond very quickly to emails, but some take a lot of time to do it. This fact (known and discovered previously by others) has a profound consequence in the way information spreads:

  1. When information spreads “successfully”, in the sense that it propagates and reaches most of the collectivity (i.e. it surpasses the tipping-point), its propagation speed of is determined by the people that have higher activity.
  2. However, when information reaches just a small fraction of the population (below the tipping-point), its propagation is controlled by those who take a lot of time to respond/forward and the spreading is very slow.

This phenomenon, as explained in our paper, has consequences for viral marketing, fads and hoaxes diffusion or opinion dynamics because the speed of their messages propagation depends strongly on the size of the sub-communities of very active and not-so active people. For example, in our campaigns (which were below the tipping-point yet successful from a viral marketing perspective), endogenous propagation of the commercial message lasted for months while the average time between getting the message and forwarding was only 1 day. We also found that messages do not “go viral”: They are viral because of the diffusion mechanism they use, but their spreading success largely depends on the social network propensity and heterogeneous behavior.

Finally, our work has some consequences for the way we model and understand human dynamics, since it shows that there is no such a thing as a typical time scale in the human dynamics. This is in sharp contrast with epidemic models, information diffusion models, etc. in which the heterogeneity in human activity and frequency is usually neglected, in favor of a more homogeneous picture of the activity of humans.

About the empirical data:
The viral marketing campaigns were conducted by IBM using the typical “refer-a-friend” mechanism which led to the endogenous diffusion of information. The campaigns’ offerings were promoted at the IBM. homepage where initial participants heard about them. Their primary marketing objective was to generate subscriptions to the company’s on-line newsletter. Subscriptions were entered through a form located in the campaign main web page (a.k.a. registration page). Additionally, a viral propagation mechanism accessible through a button located at the registration page was available to foster the message propagation. The button caption enticed visitors to recommend the page to friends and colleagues by offering, as additional incentive for people to forward the page, tickets for a prize draw to win a laptop computer. More technical details about the campaign can be found at Appendix D of the arXiv version of our paper

Press coverage:

  • ‘Infectious’ people spread memes across the web, New Scientist (12/08/09)
  • Email hoaxes are like viruses, The Inquirer (10/08/09)
  • The flow of viral video, ABC News (8/08/09)
  • New model for social marketing campaigns details why some information ‘goes viral’, PhysOrg (6/08/09)
  • Los perezosos frenan los rumores en Internet, ABC.es (14/8/09)
  • Party people spread viral internet memes, ComputerWeekly (14/8/09)
  • Desvelan las claves de la difusión de la información en las redes sociales, PlataformaSINC.es (7/9/09)
  • Nuevas claves para la difusión de información en las redes sociales, Noticias Madri+d (7/9/09)
  • Percentages and absolute numbers

    Percentage of active users in the Internet 2.0 is tiny. Fractions go from

    • only 1% of Wikipedia’s users contribute to making it better
    • only 0.1% of users upload their own videos to Youtube
    • only 3% of people with weblogs post on a daily basis
    • only 1% of Amazon.com customers contribute with reviews
    The numbers are tiny. But not uncommon. Typical return rates of marketing campaigns or surveys are around 2-5% (see report by the Direct Marketing Association). In our experiments of viral marketing campaigns we got up to 8% by triggering the action of clients using a prize contest.
    So, how do all these business survive? The reason is absolute numbers. Percentages are small, but absolute numbers are huge:
    • 70000 active contributors maintain the Wikipedia
    • 65000 videos are uploaded daily to Youtube
    • 1.6 million of posts are created daily according to Technorati
    Even for the dubious business of email spam, absolute numbers matter: tens/hundreds of people usually answer email spam campaigns. Out of hundreds of millions of emails sent, yielding a 0.0001% response rate [pdf]. But this low response rate does not matter, since sending email spam is a freemium business which uses the near-zero marginal cost of online distribution