[UA-EAI] Check code automatically for UA issues? GitHub Code Scanning, CodeQL

Sat May 30 04:40:10 UTC 2020

UA EAI WGs:

As I mentioned in our meeting on 26. May, I found out recently about a 
capability which opens the possibility for scanning code automatically 
for UA issues. It would take a technical effort to adapt existing 
technology for UA purposes, and a measurement campaign to apply the UA 
scanning to the repositories at GitHub and elsewhere.  But if we could 
do it, we might be able to multiply our impact on open-source code.

GitHub announced a service called Code Scanning, last week at their 
GitHub Satellite conference. Code Scanning is a service for running 
automated queries which look for security vulnerabilities in source 
code. They plan to run these queries on pull requests, and periodically 
on the master branch, of open source repositories in GitHub. The queries 
are written in a language called CodeQL. This language treats source 
code as data to be parsed and queried.

Presently, they have scans for security vulnerabilities and secrets 
disclosure. For example, a query can read the source code, and detect 
that a value is accepted as user input by module A, passed through 
module B, then used in a database operation in module C, without being 
sanitised against malicious input.  Or a query can look at a password 
parameter passed to a system API, and determine that the value of that 
password parameter is stored in plain view in the source code.

If CodeQL queries can do that sort of detection, then it seems to me we 
might be able to get queries written that detect which URL or domain 
name class a Java program uses. Or we might be able to detect that an 
email address is compared to a regular expression. Or perhaps other 
UA-obstructing behaviour. Might the Technology WG want to take on the 
task of figuring out how to get the queries written?

If we have such queries, perhaps we could persuade GitHub to scan for UA 
problems in addition to security and secrets problems. Or at the very 
least, we can post the queries so that projects hosted at GitHub and 
elsewhere could run them of their own accord. Might the Measurement WG 
want to figure out how to plug this into the GitHub code scanning service?

See a 23-minute introduction video at 
<https://githubsatellite.com/schedule/#stopping-vulnerabilities-at-the-source> 
(or <https://youtu.be/58N0_0HCDPE>).
News article "GitHub Code Scanning aims to prevent vulnerabilities in 
open source software" 
<https://www.helpnetsecurity.com/2020/05/08/github-code-scanning/>
<https://lgtm.com/> is I believe the originator of the CodeQL 
technology, before GitHub acquired them.
<https://lgtm.com/help/lgtm/about-lgtm> is a starting point to learn 
about CodeQL.

If there is interest in talking about this possibility at the WG 
meetings, I am happy to share what I know so far. (But I don't know 
much, and most of it is already in this email.) I have already shared 
this news with the UA Technology and UA Measurement working groups.

Best regards,
       —Jim DeLaHunt, software engineer, Vancouver, Canada

-- 
     --Jim DeLaHunt, jdlh at jdlh.com     http://blog.jdlh.com/ (http://jdlh.com/)
       multilingual websites consultant

       355-1027 Davie St, Vancouver BC V6E 4L2, Canada
          Canada mobile +1-604-376-8953