This space contains one page for each of the Java queries for the most recent enterprise release of CodeQL. Each page contains: The queries available in this release include queries that: The heatmap below shows the labels for Java queries, click a label to view all queries with that tag or query type.
About the queries
Exploring the queries
This space contains one page for each of the Java queries for the most recent enterprise release of CodeQL. Each page contains:
The queries available in this release include queries that:
The heatmap below shows the labels for Java queries, click a label to view all queries with that tag or query type.
About the security queries
There are two query suites for Java security analysis:
all. For most projects we recommend that you run queries from the
default suite. The
all suite contains a few additional rules, which test for local attacks and less severe issues. If you are concerned about local attacks or want to enable the other secondary rules, you can use the
all suite. The SAMATE/Juliet test suite includes test cases for both remote and local attacks, so the
all suite should be used when evaluating Semmle coverage for it.
Many queries rely on tracking the flow of data from sources that cannot be guaranteed to be safe. Where possible, the source of the potentially unsafe data is reported in the query violation message, indicating why that that particular use of the data was considered dangerous. We describe here how some common sources of data are classified, in order to make it clear why these are considered to be potentially dangerous.
A common data source is data that comes from a user. This must be treated as untrusted unless it is validated, as a malicious user can send unexpected data that can have undesirable effects, such as allowing them to perform a SQL injection attack.
However, user input can vary in how untrustworthy it is based on the kind of user who supplies it. We find it useful to distinguish two cases:
Input that comes from a local user, such as someone who is logged in to your server and can run local commands.
Input that comes from a remote user over the network, such as someone who is using your application over the web.
In general, the vast majority of malicious users are remote. It is much less likely than it used to be that a company will have an application running on a server to which untrusted users might be able to gain local access - indeed, if they have been able to do so, this is often a sign that the server has been entirely compromised.
For this reason, checking for security vulnerabilities that may be exploitable by a local user can produce very noisy results. In general, the results for remote user vulnerabilities will be much higher priority. Hence, some of our rules have two versions, one for local user input, and one for remote user input. The local rules are not included on the default dashboard configuration, nor are the results for those queries reported here, but they may be enabled if that is desired.
The sources that we consider remote versus local user input are listed below:
- Getting a parameter from an HttpServletRequest (or equivalent in other web frameworks)
Spoofing a request can give the user control even over parameters which are not usually set by the user.
- Getting the query string from an HttpServletRequest (or equivalent in other web frameworks)
The query string comes from the URL, which is under the control of the user.
- Getting the header from an HttpServletRequest (or equivalent in other web frameworks)
Spoofing a request can give the user control over the header.
- Getting the value of a cookie
Cookies are controlled by the client, and so their values must be treated as untrusted.
- Getting the input stream of a URLConnection or Socket
Remote connections produce data that is controlled by the client.
- Getting the hostname of a request using reverse DNS
If the user controls their DNS server, then they can return whatever result they wish for a reverse DNS lookup.
- Accessing the parameter of a method that can be called by RMI
A remote user may be able to make a RMI call with arguments that they control.
- Parsing information from XML (Android)
XML data is often sourced from the network in Android.
- Getting the current URL from a WebView (Android)
The current URL is not necessarily under the control of the phone user, and may contain malicious content.
- Getting the value of the command line arguments
This data is sourced directly from a local user.
- Getting the value of a system environment variable
The environment in which a program is run is often under the control of the user who runs the program.
- Getting the value of a Java System property
The user who ran the application may set these at will.
- Getting the value of a property from a Properties
Properties objects are commonly written to and read from disk, which may be under the control of the user.
- Getting the output of a ResultSet
Databases may contain user input that has been stored, and as such must be treated as untrusted.
- Reading from a file
A local user may control the filesystem, and hence the contents of files.
- Reading from standard in
Reading from standard in will prompt a local user for input.
Direct vs. indirect vulnerabilities
Results involving remote user input usually indicate a direct vulnerability, meaning that the user input is propagated from a remote source (for example, a HTTP servlet parameter) directly to the sink (for example, part of a SQL command), resulting in a vulnerability without further propagation of the user input.
In an indirect vulnerability, remote user input may be propagated to a local environment without causing a vulnerability directly (e.g. by inserting remote user input into a database using a JDBC prepared statement). However, if the propagated user input is then retrieved and used elsewhere, an indirect vulnerability can still exist.
A common example of an indirect vulnerability is persistent cross-site scripting. In this kind of attack, user input is first inserted into a database and subsequently retrieved and inserted into an HTML page. It is the second step that is the cause of the vulnerability. Because the second step of such an indirect attack involves user input that may originate from a remote source but is legitimately inserted into a local environment, a local environment may become tainted even if it is not directly vulnerable to attack. This is important to consider when evaluating whether rules involving local user input are relevant to a specific code base and environment.
Security analysis testing
Summary of results
The following table summarizes the results for the latest release of the Java security queries run against the SAMATE Juliet 1.3 tests. In the table, each row represents a weakness, and the columns show the following information:
- TP – count of all true positive results: the code has a known security weakness, and the CodeQL analyses correctly identify this defect.
- FP – count of all false positive results: the code has no known security weakness, but the CodeQL analyses are over cautious and suggest a potential problem.
- TN – count of true negative results: the code has no known security weakness, and the CodeQL analyses correctly pass the code as secure.
- FN – count of all false negative results: the code has a known security weakness, but the CodeQL analyses fail to identify this defect.
Interpreting the results
The report CAS Static Analysis Tool Study – Methodology, by the Center for Assured Software of the National Security Agency of the USA defines four different ways to measure success:
- Precision = TP/ (FP+TP)
- Recall = TP/(TP+FN)
- F-Score = 2*(Precision*Recall)/(Precision+Recall)
- Discrimination rate = #discriminated tests / #tests
For each of these metrics, a higher score is better. There is clearly a trade-off between the precision and recall metrics: increasing the level of precision or recall for any analysis reduces the level of the other metric. The F-score is therefore an attempt to quantify the balance of decision between these two metrics.
The following table shows the results of calculating these metrics for the results shown above. These scores compare very favorably with the sample tools tested by the Center for Assured Software.
Key differences in expectations
There are some key differences between the expectations of the CodeQL analyses and the SAMATE Juliet test suite. These can be grouped as follows:
- Passing data via data structures
Data may be added to structures (for example, a Vector, LinkedList or HashMap), and then a different method may be used to extract a data element. Tracking the flow of data through these structures using static analysis is process-intensive and very error-prone. Detecting when two references to an object may point to the same object and determining when a specific data element is extracted is impossible to analyze accurately using static analysis. Consequently, we do not track this pattern of data usage and all tests based on Juliet test variants 72-74 result in false negative results. This affects the metrics for the identification of CWE-190 and CWE-331 vulnerabilities.
- Use of SSL
We recommend that you use SSL regardless of the level of sensitivity of the information being transferred. This is a more stringent rule than required to meet the recommendations in a CWE. The Juliet tests require SSL to be used only when sensitive data is used, so this is a source of false positive results. This affects the metrics for the identification of CWE-190 and CWE-331 vulnerabilities.
The tests suggest that judicious choices have been made to balance the number of false positive results (an incorrect warning is issued) and false negative results (a true defect was not identified). Where comparative results are available for other tools, the CodeQL analyses stand out for their exceptional accuracy.