Authors:
(1) Mohammadreza Hazhirpasand, University of Bern, Bern, Switzerland;
(2) Oscar Nierstrasz, University of Bern, Bern, Switzerland;
(3) Mohammad Ghafari, University of Auckland, Auckland, New Zealand.
Table of Links
- Abstract and I. Introduction
- II. Methodology
- III. Results and Discussion
- IV. Threats to Validity
- V. Related Work
- VI. Conclusions
- VII. Acknowledgments and References
II. METHODOLOGY
In the following, we explain the objectives of this study as well as the methodology used for data collection and analysis.
1) Objective: In this study, we pose the following research question “what technical difficulties are prevalent among crypto libraries?” to tackle the underlying reasons why developers’ performance varies in using crypto APIs.
The objectives of this research are listed in the following:
• Finding prevalent themes of technical challenges in using crypto libraries helps library designers to improve the design of APIs.
• Underlying factors can assist team leaders to be aware of areas where the developers might encounter difficulties in using cryptography.
2) Selecting crypto libraries: We aim at studying posts associated with popular crypto libraries on Stack Overflow. We assumed that discussions related to crypto libraries contain
the name of the library as a tag. Hence, we selected the “cryptography” tag, i.e., base tag, to observe what other tags were used together with the base tag. We used Stack Exchange Data Explorer to run a query in order to fetch tags that appeared together with cryptography.[1] We realized that there are 2 184 tags, i.e., candidate tags. The two authors of this paper separately checked each of the candidate tags. Each of the reviewers selected the ones that are crypto libraries. They used the internet to explore a tag in which they had a lack of certainty. Then, they cross-checked the choices and discussed the tags. They arrived at the conclusion that tags that do not represent a crypto library or only provide a particular, limited service in cryptography (e.g., hashing) should not be considered. As a result, there were 6 tags that were eliminated from the list, namely rsacryptoserviceprovider, aescryptoserviceprovider, rijndaelmanaged, bcrypt, javax.crypto, and hashlib. The aforementioned tags are either a crypto class, namespace, or a dedicated module only for hashing. Ultimately, they agreed on a list of 20 crypto libraries, illustrated in Table I.
3) Crypto libraries: The selected crypto libraries are all widely used in practice and have been examined in research projects. For instance, six of the selected libraries, i.e., OpenSSL, libsodium, Bouncy Castle, SJCL, Crypto-JS, and PyCrypto, were studied for finding usability issues [6]. MCrypt is the successor to the Unix crypt command, which supports modern encryption algorithms. [2] The phpseclib library offers pure-PHP implementations of SSH2, SFTP, RSA, DSA, and many other algorithms. [3] Crypto++ and Botan are both C++ crypto libraries that support a wide range of crypto algorithms and security protocols. [4] [5] The Microsoft CryptoAPI interface enables developers to employ authentication, encoding, and encryption to Windows-based applications. [6] Jasypt and Java Cryptography Architecture (JCA) are both intended for Java developers, and the latter is part of the Java security API. [7] [8] The Web Crypto AP is intended to present basic cryptographic operations for web applications and defines cryptographic primitives in a native JavaScript API. [9] The wolfSSL TLS library is a lightweight, C-language-based library designed for IoT, embedded systems, and smart grids. [10] There are also popular OpenSSL wrappers in languages such as nodecrypto in Node.js and pyOpenSSL in Python. There have been numerous studies to investigate the security point of view of aforementioned crypto libraries and their strengths and weaknesses were examined [7] [8] [9]. However, the security evaluation of these crypto libraries falls outside the scope of this paper.
4) Manual investigation: In total, there are 24 648 posts that contained the selected crypto libraries’ tags. We computed the required sample size for the population with a confidence level of 95% and a margin of error of 4.34%, which results in sampling 500 posts. We then equally selected 25 posts from each tag (i.e., a crypto library). We queried the posts containing a crypto library tag,e.g., OpenSSL, and set the search criteria to “recent activity”, so that Stack Overflow returns the recent active discussions. Since we observed questions that are either unanswered or received negative votes, we decided to choose the posts for which the question received at least one upvote and at least one answer. The list of the selected questions are available online.[11]
Thereafter, we employed thematic analysis, a qualitative research method for finding themes in texts [10], to deduce the frequent topics from the chosen posts. Since our study is of an exploratory nature, we did not devise a list of themes prior to studying the posts. Hence, in order to link each post to a suitable theme, two authors of the paper were responsible to separately study the posts and deduce the main issue (i.e., theme) of the post. The reviewers carefully reviewed the title, question body, and answer body of each post. Despite the fact that each post may entail several crypto concepts, the reviewers’ objective was to find the key issue of each post. They employed open coding in which a short explanation label was assigned to each post [11]. Each author reiterated the coding phase three times to improve their deduced list of themes. To evaluate the inter-rater agreement between the two reviewers, we employed Cohen’s kappa to assess the agreement level [12]. Deducing the themes from the posts, the reviewers received 68% Cohen’s Kappa score, which indicates a substantial agreement between the two reviewers. Finally, the two reviewers compared the two lists and discussed any disagreements. The two reviewers used different wording for building the list of themes and the total number of themes was not identical. They re-analyzed the particular posts in multiple sessions where they had different views. In some scenarios, they realized that one of the reviewers broke down one theme into several sub-themes, which they then merged if necessary. Ultimately, they agreed on 10 themes for the analyzed posts.
This paper is available on arxiv under CC BY 4.0 DEED license.
[1] https://data.stackexchange.com/
[2] http://mcrypt.sourceforge.net
[3] https://github.com/phpseclib/phpseclib
[4] https://www.cryptopp.com
[5] https://botan.randombit.net
[6] https://docs.microsoft.com/en-us/windows/win32/seccrypto/cryptography-portal
[7] http://www.jasypt.org/
[8] https://www.oracle.com/java/technologies/javase/javase-tech-security.html
[9] https://www.w3.org/TR/WebCryptoAPI/
[10] https://www.wolfssl.com/
[11] http://crypto-explorer.com/crypto_libs/