cff-version: 1.2.0
abstract: "<p>We start by identifying U.S.-based software organizations in the computer programming and data processing industry (SIC 737), as a knowledge-intensive high-growth setting. We integrate two main data sources. First, to collect the knowledge-based measures, we use publicly available data provided by the U.S. Patent and Trademark Office (USPTO). Using the General Architecture for Text Engineering (GATE) software, we design queries that retrieve the complete class and subclass information for each patent, as well as citations, inventors, and total patents granted between 1998 and 2011 inclusive. We aggregate the data by organization-year observation at the class and subclass levels and use these aggregated measures to compute the knowledge-based predictors and covariates. To compute moving averages for some variables, we collect five years of additional USPTO data which makes our knowledge dataset span between 1993 and 2011. Second, we use Compustat to collect organization-level control variables such as assets, number of employees, market valuation, R&amp;D expenditures, intangibles, solvency, and slack. The integration of the two datasets yields a final sample panel of 100 organizations with 3.2 years of observations on average per organization from 1998 to 2011.</p>"
authors:
  - family-names: Vlas
    given-names: Cristina
title: "Software firms dataset about diversification and interdependence"
keywords:
version: 1
identifiers:
  - type: doi
    value: 10.4121/7349e277-d28c-48e6-953b-93e61654ef00.v1
license: CC0
date-released: 2023-09-13