Software firms dataset about diversification and interdependence

DOI:10.4121/7349e277-d28c-48e6-953b-93e61654ef00.v1

The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/7349e277-d28c-48e6-953b-93e61654ef00

Datacite citation style

Vlas, Cristina (2023): Software firms dataset about diversification and interdependence. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/7349e277-d28c-48e6-953b-93e61654ef00.v1

Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Dataset

Usage statistics

178

views

downloads

Keywords

diversification innovation interdependence performance technology

Time coverage

1998-2011

Licence

CC0

Interoperability

RO-Crate Metadata

Export as...

RefWorks BibTeX Reference Manager Endnote DataCite NLM DC CFF

by Cristina Vlas

We start by identifying U.S.-based software organizations in the computer programming and data processing industry (SIC 737), as a knowledge-intensive high-growth setting. We integrate two main data sources. First, to collect the knowledge-based measures, we use publicly available data provided by the U.S. Patent and Trademark Office (USPTO). Using the General Architecture for Text Engineering (GATE) software, we design queries that retrieve the complete class and subclass information for each patent, as well as citations, inventors, and total patents granted between 1998 and 2011 inclusive. We aggregate the data by organization-year observation at the class and subclass levels and use these aggregated measures to compute the knowledge-based predictors and covariates. To compute moving averages for some variables, we collect five years of additional USPTO data which makes our knowledge dataset span between 1993 and 2011. Second, we use Compustat to collect organization-level control variables such as assets, number of employees, market valuation, R&D expenditures, intangibles, solvency, and slack. The integration of the two datasets yields a final sample panel of 100 organizations with 3.2 years of observations on average per organization from 1998 to 2011.

History

2023-09-13 first online, published, posted

Publisher

4TU.ResearchData

Format

.xlsx

Organizations

University of Massachusetts, Isenberg School of Management

DATA

Files (1)

4,248,783 bytesMD5:0eae65a4c9698804c8c9936301fd44a0KDIFdataset-051222-v2.xlsx