Everyone who writes complex SQL (or dare I say,
move business logic into the database?) wishes for parameterised views from time to time. This hypothetical syntax would be terrific:
CREATE VIEW v_categories_per_actor(
p_actor_id NUMBER
) AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = p_actor_id
Or as user-defined functions if you insist (as available in SQL Server):
CREATE FUNCTION v_categories_per_actor(
p_actor_id NUMBER
) RETURNS TABLE AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = p_actor_id
As always on this blog,
I’m using the useful Sakila database for examples.
The above parameterised view could now be used very nicely with awesome SQL features like
APPLY
/
LATERAL
:
-- Oracle 12c / SQL Server syntax:
SELECT a.first_name, a.last_name, c.name
FROM actor a
OUTER APPLY v_categories_per_actor(a.actor_id)
-- Oracle 12c / PostgreSQL / SQL standard syntax:
SELECT a.first_name, a.last_name, c.name
FROM actor a
LEFT JOIN LATERAL v_categories_per_actor(a.actor_id) ON 1 = 1
More info about LATERAL here. The good news about the above is: SQL Server supports this through
inline table valued functions, where the database optimiser can actually go inside the function / view and optimise it away, potentially implementing it with ordinary JOIN algorithms!
Unfortunately, this isn’t true for Oracle (yet). Even in Oracle 12c (where
APPLY
/
LATERAL
is now supported), table valued functions are opaque for the optimiser, which cannot “peek inside” to apply nice SQL transformation operations to optimise such statements.
People emulate parameterised views with SYS_CONTEXT
A lot of people use Oracle’s
SYS_CONTEXT feature to emulate parameterised views:
A
SYS_CONTEXT
is essentially a global variable (per session), which you can set e.g. when getting a connection out of your connection pool in your Java client application. From then on, all SQL statements that use
SYS_CONTEXT
are globally “parameterised”. For example, our previous view becomes this:
CREATE VIEW v_categories_per_actor AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
Now, of course, this view cannot be used (yet), because it will always return an empty result:
SELECT *
FROM v_categories_per_actor
ORDER BY name;
Which yields:
NAME
----
Now, let’s do that
SYS_CONTEXT
initialisation magic:
-- Just some boilerplate you have to do once
CREATE CONTEXT my_app USING set_ctx;
CREATE OR REPLACE PROCEDURE set_ctx(
p_actor_id NUMBER := NULL
) IS
BEGIN
dbms_session.set_context('MY_APP', 'ACTOR_ID', p_actor_id);
END;
/
-- This will now set the ACTOR_ID for our session
EXEC set_ctx(1);
If we re-run our previous statement that queries the view now:
SELECT *
FROM v_categories_per_actor
ORDER BY name;
We’ll get an actual result!
NAME
-----------
Animation
Children
Classics
Comedy
Documentary
Family
Foreign
Games
Horror
Music
New
Sci-Fi
Sports
Now, of course this is nowhere near as powerful as actual parameterised views, because I cannot lateral join this thing to an actual actor table, setting that context on each row. But it already helps if you want your Java client application be able to set a parameter for such a view. Let’s call it:
Oracle’s poor man’s parameterised views
How does SYS_CONTEXT impact performance?
A client of mine makes heavy use of
SYS_CONTEXT
for security purposes, in order to implement something similar to row-level security. I’ll show an example later on.
You’ll also
find a simpler “security” use-case on an interesting Ask Tom article:
Note how in the WHERE clause, I always use the SYS_CONTEXT function. I never put the literal value into the query—that would be very bad for Performance Scalability Shared pool utilization Perhaps most important: security (SQL injection)
I don’t really agree with this, because in my opinion, bind variables are a much better approach to this simplistic security concern.
In addition to that, since Oracle 11g, we have
adaptive cursor sharing and bind variable peeking, which means that the same query can produce distinct execution plans depending on the actual bind variable values. This isn’t (currently) being done for
SYS_CONTEXT
. See also this very interesting article by
Connor McDonald:
https://connormcdonald.wordpress.com/2016/10/20/taking-a-peek-at-sys_context
Connor was kind enough to write this article as a reply to a question about
SYS_CONTEXT
I asked on Twitter:
Not peeking at
SYS_CONTEXT
may be good or bad, depending on your taste and on your opinion about
Oracle 11g’s adaptive cursor sharing feature. I’m personally a big fan of it, although I do appreciate that some experts don’t like it too much (
see also Tony Hasler’s opinions in Expert Oracle SQL).
Row level security with SYS_CONTEXT
The important thing is just to remember, that no peeking is done at
SYS_CONTEXT
values. What does this mean?
This client of mine has always used
SYS_CONTEXT
just like I mentioned before, for security reasons. More particularly, they implemented sophisticated row-level security with it (they did this before Oracle had out of the box support for row-level security). Again, if you look at the previous query:
CREATE VIEW v_categories_per_actor AS
SELECT DISTINCT c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
You could interpret this as being a security feature. Actors can only access
their own data across the entire database. In this case, if clients don’t get GRANTs to tables, but only to these views, it’s possible to prevent access to all sorts of data that is not related to
“my own actor_id”, even if client code forgets to add the appropriate predicate.
The execution plan for a query against this view shows that it’s quite decent:
-----------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 13 |
| 1 | SORT ORDER BY | | 1 | 16 | 13 |
| 2 | VIEW | V_CATEGORIE..| 1 | 16 | 13 |
| 3 | HASH UNIQUE | | 1 | 16 | 13 |
|* 4 | HASH JOIN | | 1 | 27 | 19 |
| 5 | NESTED LOOPS | | 1 | 27 | 19 |
| 6 | INDEX FAST FULL SCAN| PK_FILM_CAT..| 1 | 1000 | 1000 |
|* 7 | INDEX UNIQUE SCAN | PK_FILM_ACTOR| 1000 | 1 | 19 |
| 8 | TABLE ACCESS FULL | CATEGORY | 1 | 16 | 16 |
-----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
7 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID'))
AND "FC"."FILM_ID"="FA"."FILM_ID")
As you can see, the cardinality estimates are all pretty OK and this query does the right thing as the database knows it can use the single
SYS_CONTEXT
value for a fast access predicate on the
FILM_ACTOR
primary key.
Let’s ignore for a moment the slightly wrong cardinality estimate on
PK_FILM_ACTOR
, whose effects can be fixed by using a
/*+LEADING(fa fc)*/
hint inside of the view. But this is not related to using
SYS_CONTEXT
.
Using SYS_CONTEXT to overload view behaviour
Now, that particular client evaluated whether
SYS_CONTEXT
can be used for something entirely else: To put two
UNION ALL
subqueries in a view and depending on the “session mode”, they wanted to execute one or the other query. Let’s assume you have different types of users in your client application:
- Normal customers: “C”
- Operators: “O”
The latter have more privileges and can see entirely different data. In the terms of our view this might mean the following:
CREATE OR REPLACE VIEW v_categories_per_actor AS
SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
AND sys_context('MY_APP', 'USER_TYPE') = 'C'
UNION ALL
SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE sys_context('MY_APP', 'USER_TYPE') = 'O'
What this means now is that the same view is reused for both types of users: customers/actors (C) and operators (O). Customers only get to see their own data whereas operators get to see all data. Imagine that the real query is much more complex.
So, this is really nice, because then you can start reusing complex views and put these views in other complex views and the behaviour of your entire application starts changing depending on who is logging in. So, this shouldn’t just be called row level security, it should be called access control list, because that’s what they’re really doing.
Excellent!
So where’s the problem? The problem lies in the execution plans. Let’s update our procedure again:
CREATE OR REPLACE PROCEDURE set_ctx(
p_actor_id NUMBER := NULL
) IS
BEGIN
dbms_session.set_context('MY_APP', 'ACTOR_ID', p_actor_id);
dbms_session.set_context('MY_APP', 'USER_TYPE',
CASE WHEN p_actor_id IS NULL THEN 'O' ELSE 'C' END);
END;
/
EXEC set_ctx(1);
Not high security. If we initialise the context with an
P_ACTOR_ID
, then we’re a customer (C), otherwise, we’re an operator (O). What’s our plan now?
SELECT * FROM TABLE (dbms_xplan.display_cursor(
sql_id => 'cmdfbydppfqbu',
format => 'ALLSTATS LAST'
));
Which yields…
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 13 |
| 1 | SORT ORDER BY | | 1 | 2275 | 13 |
| 2 | VIEW | V_CATEGORIES_PER_ACTOR | 1 | 2275 | 13 |
| 3 | UNION-ALL | | 1 | | 13 |
| 4 | HASH UNIQUE | | 1 | 12 | 13 |
|* 5 | FILTER | | 1 | | 19 |
|* 6 | HASH JOIN | | 1 | 27 | 19 |
| 7 | NESTED LOOPS | | 1 | 27 | 19 |
| 8 | INDEX FAST FULL SCAN| PK_FILM_CATEGORY | 1 | 1000 | 1000 |
|* 9 | INDEX UNIQUE SCAN | PK_FILM_ACTOR | 1000 | 1 | 19 |
| 10 | TABLE ACCESS FULL | CATEGORY | 1 | 16 | 16 |
| 11 | HASH UNIQUE | | 1 | 2263 | 0 |
|* 12 | FILTER | | 1 | | 0 |
|* 13 | HASH JOIN | | 0 | 5462 | 0 |
|* 14 | HASH JOIN | | 0 | 1000 | 0 |
| 15 | TABLE ACCESS FULL | CATEGORY | 0 | 16 | 0 |
| 16 | INDEX FAST FULL SCAN| PK_FILM_CATEGORY | 0 | 1000 | 0 |
| 17 | INDEX FAST FULL SCAN | PK_FILM_ACTOR | 0 | 5462 | 0 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='C')
6 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
9 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID'))
AND "FC"."FILM_ID"="FA"."FILM_ID")
12 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='O')
13 - access("FC"."FILM_ID"="FA"."FILM_ID")
14 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
As you can see, the first part of the
UNION-ALL
concatenation is still roughly the same, except we now got this
FILTER
operation on operation #5. The second part of the
UNION-ALL
operation, however, got its cardinalities completely wrong. Operation 11 estimates 2263 rows even if there are none.
The E-Rows column (estimated rows) estimate that all data is selected from our Sakila database, i.e. 1000
FILM_CATEGORY
relationships and 5462
FILM_ACTOR
relationships, which is all of our data. But the A-Rows column (actual rows) is zero, as expected, because we set our
USER_TYPE
context value to
'C'
for customer, not
'O'
for operator.
Interesting also the Starts column, which shows that the operations below the
FILTER
operation on line #12 aren’t started. This was also confirmed by Franck Pachot in that interesting Twitter conversation:
And also by Chris Antognini:
Good news, no?
Yes and no.
- YES: Because even if the plan looks quite bad (for customer usage), it performed optimally. At least, Oracle knew when to stop even if estimates were wrong.
- NO: Because all these cardinality (and cost) estimates will propagate leading to bigger and bigger errors, depending on how you use this view.
Check out the following query:
SELECT actor_id, name, COUNT(*)
FROM v_categories_per_actor ca
JOIN category c USING (name)
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id, actor_id)
GROUP BY actor_id, name
ORDER BY actor_id, name;
The above query could of course be stored in a view again to fit our security concept of giving grants only to views, not tables… In any case, what we’re doing here is the following:
We take all categories per actor from our previous view, then we want to count the number of total films in that category for that actor. If we’re still running this in a customer (C) context with
ACTOR_ID = 1
, we’ll get:
ACTOR_ID NAME COUNT(*)
---------- ------------------------- ----------
1 Animation 1
1 Children 1
1 Classics 2
1 Comedy 1
1 Documentary 1
1 Family 2
1 Foreign 1
1 Games 2
1 Horror 3
1 Music 1
1 New 2
1 Sci-Fi 1
1 Sports 1
Which translates to: I (my user =
ACTOR_ID = 1
) have played in these categories, and these categories have so many total films that I played in. Again, the actual query might be much more complex, where we can’t easily factor out things (e.g. avoid doubling access to various tables). I’m just trying to make a point here.
What’s the plan of this query? It’s very bad
---------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 13 |
| 1 | SORT ORDER BY | | 1 | 2263 | 13 |
| 2 | HASH GROUP BY | | 1 | 2263 | 13 |
| 3 | NESTED LOOPS | | 1 | 3883 | 19 |
|* 4 | HASH JOIN | | 1 | 62130 | 247 |
|* 5 | HASH JOIN | | 1 | 2275 | 13 |
| 6 | TABLE ACCESS FULL | CATEGORY | 1 | 16 | 16 |
| 7 | VIEW | V_CATEGORIES_... | 1 | 2275 | 13 |
| 8 | UNION-ALL | | 1 | | 13 |
| 9 | HASH UNIQUE | | 1 | 12 | 13 |
|* 10 | FILTER | | 1 | | 19 |
|* 11 | HASH JOIN | | 1 | 27 | 19 |
| 12 | NESTED LOOPS | | 1 | 27 | 19 |
| 13 | INDEX FAST FULL SCAN| PK_FILM_CATEGORY | 1 | 1000 | 1000 |
|* 14 | INDEX UNIQUE SCAN | PK_FILM_ACTOR | 1000 | 1 | 19 |
| 15 | TABLE ACCESS FULL | CATEGORY | 1 | 16 | 16 |
| 16 | HASH UNIQUE | | 1 | 2263 | 0 |
|* 17 | FILTER | | 1 | | 0 |
|* 18 | HASH JOIN | | 0 | 5462 | 0 |
|* 19 | HASH JOIN | | 0 | 1000 | 0 |
| 20 | TABLE ACCESS FULL | CATEGORY | 0 | 16 | 0 |
| 21 | INDEX FAST FULL SCAN| PK_FILM_CATEGORY | 0 | 1000 | 0 |
| 22 | INDEX FAST FULL SCAN | PK_FILM_ACTOR | 0 | 5462 | 0 |
| 23 | INDEX FAST FULL SCAN | PK_FILM_ACTOR | 1 | 5462 | 5462 |
|* 24 | INDEX UNIQUE SCAN | PK_FILM_CATEGORY | 247 | 1 | 19 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("CA"."ACTOR_ID"="FA"."ACTOR_ID")
5 - access("CA"."NAME"="C"."NAME")
10 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='C')
11 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
14 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID'))
AND "FC"."FILM_ID"="FA"."FILM_ID")
17 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='O')
18 - access("FC"."FILM_ID"="FA"."FILM_ID")
19 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
24 - access("FC"."FILM_ID"="FA"."FILM_ID" AND "C"."CATEGORY_ID"="FC"."CATEGORY_ID")
Why is it so bad?
Observe how at some point, I had a cardinality estimate of 62130 (operation #4), and the whole query was still expected to return 2263 rows (operation #1), when in fact, I got only 13.
Even if the optimiser got the number of rows almost right for the first
UNION ALL
subquery (estimated 12, got 13 on operation #9), the estimate for the second
UNION ALL
subquery made it think that with so many rows coming out of the view (2263 on operation #16), a hash join will be optimal to count the number of films (operation #4 and #5). While the actual numbers aren’t as bad as it was estimated, the hash join operation is much more costly for small data sets, than an equivalent nested loop join operation.
If we remove again that
UNION ALL
operation from the view, restricting the view back to the original customer (C) only use case:
CREATE OR REPLACE VIEW v_categories_per_actor AS
SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
AND sys_context('MY_APP', 'USER_TYPE') = 'C'
… then, we get a much better plan for that aggregation, where we now get those nested loops:
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 13 |
| 1 | SORT ORDER BY | | 1 | 12 | 13 |
| 2 | HASH GROUP BY | | 1 | 12 | 13 |
| 3 | VIEW | VM_NWVW_1 | 1 | 47 | 19 |
| 4 | HASH UNIQUE | | 1 | 47 | 19 |
|* 5 | FILTER | | 1 | | 33 |
| 6 | NESTED LOOPS | | 1 | 47 | 33 |
|* 7 | HASH JOIN | | 1 | 746 | 361 |
| 8 | NESTED LOOPS | | 1 | 27 | 19 |
|* 9 | HASH JOIN | | 1 | 1000 | 1000 |
|* 10 | HASH JOIN | | 1 | 16 | 16 |
| 11 | TABLE ACCESS FULL | CATEGORY | 1 | 16 | 16 |
| 12 | TABLE ACCESS FULL | CATEGORY | 1 | 16 | 16 |
| 13 | INDEX FAST FULL SCAN| PK_FILM_CATEGORY | 1 | 1000 | 1000 |
|* 14 | INDEX UNIQUE SCAN | PK_FILM_ACTOR | 1000 | 1 | 19 |
|* 15 | INDEX RANGE SCAN | PK_FILM_ACTOR | 1 | 27 | 19 |
|* 16 | INDEX UNIQUE SCAN | PK_FILM_CATEGORY | 361 | 1 | 33 |
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter(SYS_CONTEXT('MY_APP','USER_TYPE')='C')
7 - access("FA"."ACTOR_ID"="FA"."ACTOR_ID")
9 - access("C"."CATEGORY_ID"="FC"."CATEGORY_ID")
10 - access("C"."NAME"="C"."NAME")
14 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID'))
AND "FC"."FILM_ID"="FA"."FILM_ID")
15 - access("FA"."ACTOR_ID"=TO_NUMBER(SYS_CONTEXT('MY_APP','ACTOR_ID')))
16 - access("FC"."FILM_ID"="FA"."FILM_ID" AND "C"."CATEGORY_ID"="FC"."CATEGORY_ID")
All the cardinality estimates are now much better, unsurprisingly. Let’s benchmark and measure, too:
SET SERVEROUTPUT ON
DECLARE
v_ts TIMESTAMP;
v_repeat CONSTANT NUMBER := 1000;
BEGIN
v_ts := SYSTIMESTAMP;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT actor_id, name, COUNT(*)
FROM v_categories_per_actor ca -- No UNION ALL
JOIN category c USING (name)
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id, actor_id)
GROUP BY actor_id, name
ORDER BY actor_id, name
) LOOP
NULL;
END LOOP;
END LOOP;
dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
v_ts := SYSTIMESTAMP;
FOR i IN 1..v_repeat LOOP
FOR rec IN (
SELECT actor_id, name, COUNT(*)
FROM v_categories_per_actor2 ca -- With UNION ALL
JOIN category c USING (name)
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id, actor_id)
GROUP BY actor_id, name
ORDER BY actor_id, name
) LOOP
NULL;
END LOOP;
END LOOP;
dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
END;
/
Which yields:
Statement 1 : +000000000 00:00:01.940000000
Statement 2 : +000000000 00:00:02.923000000
Clearly, that UNION ALL and its resulting hash joins are hurting us drastically!
Conclusion: Use SYS_CONTEXT with care
SYS_CONTEXT
can be useful to emulate parameterised views as we’ve seen. There’s nothing wrong about doing “extended row-level security” by adding
SYS_CONTEXT
predicates in views and granting access only to views, not tables, e.g.
CREATE OR REPLACE VIEW v_categories_per_actor AS
SELECT DISTINCT actor_id, c.name
FROM category c
JOIN film_category fc USING (category_id)
JOIN film_actor fa USING (film_id)
WHERE fa.actor_id = TO_NUMBER(sys_context('MY_APP', 'ACTOR_ID'))
AND sys_context('MY_APP', 'USER_TYPE') = 'C'
However, due to the lack of peeking at those
SYS_CONTEXT
values, we cannot profit from the adaptive cursor sharing feature. This might have been possible with real parameterised views (such as they are supported in SQL Server), but in this case, we don’t get multiple alternative execution plans for the same SQL query, depending on
SYS_CONTEXT
values. This has been shown by Connor McDonald in his blog post:
https://connormcdonald.wordpress.com/2016/10/20/taking-a-peek-at-sys_context
I do hope that a future version of Oracle will treat
SYS_CONTEXT
more like bind variables, because ultimately, that’s what they are: Constant external values for the scope of a single query execution, or “parameters” like in “parameterised views”. Until we have that (or real parameterised views), I strongly advise against using
SYS_CONTEXT
for the use-case that my client was testing (cutting off individual
UNION ALL
subtrees from execution plans).
But the good news is: There’s nothing wrong with the ordinary use-case (forming predicates in
WHERE
clauses).
PS: If you like the idea of peeking at
SYS_CONTEXT
, be sure to also leave a comment here:
https://community.oracle.com/ideas/15826Like this:
Like Loading...
How many distinct possible values are there for the (‘MY_APP’, ‘USER_TYPE’) parameter? I might have a better (more stable) solution.
In the real-world scenario, there are currently 4 values