Analyzing Textual Information

Johannes Ledolter - The University of Iowa, USA
Lea S. VanderVelde - The University of Iowa, USA

Volume: 188

Series:
Quantitative Applications in the Social Sciences

May 2021 | 192 pages | SAGE Publications, Inc

Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources to enrich the use of this book are available on an accompanying website at: https://www.biz.uiowa.edu/faculty/jledolter/analyzing-textual-information/. These resources include data files from the 39th Congress, and also the collection of tweets of President Trump, now no longer available to researchers via Twitter itself.

Available Formats

ISBN: 9781544390000	Paperback	Suggested Retail Price: $51.00	Bookstore Price: $40.80
ISBN: 9781544390031	Electronic Version	Suggested Retail Price: $39.00	Bookstore Price: $31.20

See what’s new to this edition by selecting the Features tab on this page. Should you need additional information or have questions regarding the HEOA information provided for this title, including what is new to this edition, please email sageheoa@sagepub.com. Please include your name, contact information, and the name of the title for which you would like more information. For information on the HEOA, please go to http://ed.gov/policy/highered/leg/hea08/index.html.

For assistance with your order: Please email us at textsales@sagepub.com or connect with your SAGE representative.

SAGE
2455 Teller Road
Thousand Oaks, CA 91320
www.sagepub.com

Series Editor’s Introduction

Preface

Acknowledgments

About the Authors

Chapter 1: Introduction

1.1 Text Data

1.2 The Two Applications Considered in This Book

1.3 Introductory Example and Its Analysis Using the R Statistical Software

1.4 The Introductory Example Revisited, Illustrating Concordance and Collocation Using Alternative Software

1.5 Concluding Remarks

1.6 References

Chapter 2: A Description of the Studied Text Corpora and A Discussion of Our Modeling Strategy

2.1 Introduction to the Corpora: Selecting the Texts

2.2 Debates of the 39th U.S. Congress, as recorded in the Congressional Globe

2.3 The Territorial Papers of the United States

2.4 Analyzing Text Data: Bottom-Up or Top-Down Analysis

2.5 References

Appendix to Chapter 2: The Complete Congressional Record

Chapter 3: Preparing Text for Analysis: Text Cleaning and Formatting

3.1 Text Cleaning

3.2 Text Formatting

3.3 Concluding Remarks

3.4 References

Chapter 4: Word Distributions: Document-Term Matrices of Word Frequencies and the “Bag of Words” Representation

4.1 Document-Term Matrices of Frequencies

4.2 Displaying Word Frequencies

4.3 Co-Occurrence of Terms in the Same Document

4.4 The Zipf Law: An Interesting Fact About the Distribution of Word Frequencies

4.5 References

Chapter 5: Metavariables and Text Analysis Stratified on Metavariables

5.1 The Significance of Stratification and the Importance of Metavariables

5.2 Analysis of the Territorial Papers

5.3 Analysis of Speeches From the 39th Congress

5.4 References

Chapter 6: Sentiment Analysis

6.1 Lexicons of Sentiment-Charged Words

6.2 Applying Sentiment Analysis to the Letters of the Territorial Papers

6.3 Using Other Sentiment Dictionaries and the R Software tidytext for Sentiment Analysis

6.4 Concluding Remarks: An Alternative Approach for Sentiment Analysis

6.5 References

Chapter 7: Clustering of Documents

7.1 Clustering Documents

7.2 Measures for the Closeness and the Distance of Documents

7.3 Methods for Clustering Documents

7.4 Illustrating Clustering Methods on a Simulated Example

7.5 References

Chapter 8: Classification of Documents

8.1 Introduction

8.2 Classification Procedures

8.3 Two Examples Using the Congressional Speech Database

8.4 Concluding Remarks on Authorship Attribution: Commenting on the Field of Stylometry

8.5 References

Chapter 9: Modeling Text Data: Topic Models

9.1 Topic Models

9.2 Fitting Topic Models to the Two Corpora Studied in This Book

9.3 References

Chapter 10: n-Grams and Other Ways of Analyzing Adjacent Words

10.1 Analysis of Bigrams

10.2 Text Windows to Measure Word Associations Within a Neighborhood of Words and a Discussion of the R Package text2vec

10.3 Illustrating the Use of n-Grams: Speeches of the 39th Congress

Chapter 11: Concluding Remarks

Appendix: Listing of Website Resources

The authors balance sophisticated analysis in R with the fundamentals of text mining so that all readers can understand and apply to their own analysis of text data.

Matthew Eshbaugh-Soha

University of North Texas

If you have a little experience with R, Ledolter and Vandervelde have created an accessible book for learning to analyze text. They provide a scaffolded experience with concrete examples and access to the text and code. They also provide technical information for those interested in a deeper dive of the material. Readers will feel comfortable analyzing their own text as they use the provided material and progress through the book. I will be adding this book to my applied practicum course.

James B. Schreiber

Duquesne University

Key features

Researchers in the social sciences and beyond are dealing more and more with massive quantities of text data requiring analysis, from historical letters to the constant stream of content in social media. Traditional texts on statistical analysis have focused on numbers, but this book will provide a practical introduction to the quantitative analysis of textual data. Using up-to-date R methods, this book will take readers through the text analysis process, from text mining and pre-processing the text to final analysis. It includes two major case studies using historical and more contemporary text data to demonstrate the practical applications of these methods. Currently, there is no introductory how-to book on textual data analysis with R that is up-to-date and applicable across the social sciences. Code and a variety of additional resources are available on an accompanying website for the book.

Sample Materials & Chapters

Chapter 1: Introduction

Chapter 2: A Description of the Studied Text Corpora and A Discussion of Our Mod

You are here

Analyzing Textual Information
From Words to Meanings through Numbers

Sample Materials & Chapters

Sage College Publishing

Related Products