Social media platforms promise to enable rich and vibrant conversations online; however, their potential is often hindered by antisocial behaviors. In this paper, we study the relationship between structure and toxicity in conversations on Twitter. We collect 1.18M conversations (58.5M tweets, 4.4M users) prompted by tweets that are posted by or mention major news outlets over one year and candidates who ran in the 2018 US midterm elections over four months. We analyze the conversations at the individual, dyad, and group level. At the individual level, we find that toxicity is spread across many low to moderately toxic users. At the dyad level, we observe that toxic replies are more likely to come from users who do not have any social connection nor share many common friends with the poster. At the group level, we find that toxic conversations tend to have larger, wider, and deeper reply trees, but sparser follow graphs. To test the predictive power of the conversational structure, we consider two prediction tasks. In the first prediction task, we demonstrate that the structural features can be used to predict whether the conversation will become toxic as early as the first ten replies. In the second prediction task, we show that the structural characteristics of the conversation are also predictive of whether the next reply posted by a specific user will be toxic or not. We observe that the structural and linguistic characteristics of the conversations are complementary in both prediction tasks. Our findings inform the design of healthier social media platforms and demonstrate that models based on the structural characteristics of conversations can be used to detect early signs of toxicity and potentially steer conversations in a less toxic direction.