Using Gephi to Analyze the Oracle SYS Schema
If you knew nothing about a database, how would you figure out the main tables and relationships of that DB? One way is to use Gephi, a network visualization tool. By “network”, I don’t necessarily mean the TCP/IP kind, although you can certainly use Gephi for that too. By “network”, I mean entities that are networked together by relationships. The most common example is a social network.
To experiment with Gephi, I used the tables and keys of the Oracle 11g SYS schema to create this page’s hero image. The following excerpt focuses on four groups. Starting with the Advisory wri$_adv_xxx group in light blue, I used Gephi to zoom into the largest tables, called nodes in Gephi. I gathered a few of these table names and put them into the following query, which revealed that these tables have task_id, or task_id plus another column, as their keys.
SELECT cols.table_name, cols.column_name, cols.position, cons.status, cons.owner, cons.*
FROM all_constraints cons
JOIN all_cons_columns cols ON cons.constraint_name = cols.constraint_name AND cons.owner = cols.owner
WHERE 1=1
AND cols.table_name IN ('WRI$_ADV_ACTIONS','WRI$_ADV_OBJECTS','WRI$_ADV_EXECUTIONS','WRI$_ADV_ADDM_TASKS')
AND cons.constraint_type IN ('P','U')
AND cols.owner = 'SYS'
AND cons.status = 'ENABLED'
ORDER BY cols.table_name, cons.constraint_type DESC, cols.position;
Moving clockwise, I then analyzed the wrm$_wr_control group in orange. The wrm$ tables store metadata information for the Automatic Workload Repository (AWR). Using the above example query or using the edge labels in Gephi, I determined that the wrm$_wr_control table relates to other nodes via the dbid key.
Also in the orange group are the wrh$_xxx tables, which are also part of AWR and store historical data or snapshots. These tables are related via the dbid+snapid+instance_number key. Note that several tables, like wrh$_sql_summary and wrh$_shared_server_summary, have the same keys, which is why Gephi rendered several orange nodes with the same size.
Lastly in the snapshot, we have the Advanced Queueing group in blue, which relate to each other via the msgid key. Again, we have several tables that define msgid as their key, so Gephi rendered several blue nodes with the same size.
The following excerpts depict the remaining major groups - the neon green, dbms_alert_info group related via the name+sid key and the hot pink, scheduler$_xxx group related via the obj# key.
If you are interested in how I generated the data that I imported into Gephi, here is the PL/SQL script and helper statements that I used. The script inserts nodes/tables into a global temporary table (GTT). In addition, the script inserts edges/keys into a GTT.
declare
cursor mycur is
SELECT acc.table_name, COUNT(*) AS pk_count,
'''' || LISTAGG(acc.column_name, ''',''') WITHIN GROUP (ORDER BY acc.column_name) || '''' AS pk_list,
LISTAGG(acc.column_name, '+') WITHIN GROUP (ORDER BY acc.column_name) AS pretty_pk_list
FROM all_constraints ac
JOIN all_cons_columns acc ON acc.constraint_name = ac.constraint_name AND ac.owner = acc.owner
WHERE 1=1
AND ac.owner = 'SYS'
AND ac.status = 'ENABLED'
AND ac.constraint_type IN ('P', 'U')
AND acc.column_name NOT IN ('NAME','ID','INST_ID','JOB_NAME') --Remove generic key names
GROUP BY acc.table_name;
TYPE cur_typ IS REF CURSOR;
d cur_typ;
query_str VARCHAR2(500);
query_str1 VARCHAR2(500);
query_owner VARCHAR2(30);
query_table_name VARCHAR2(30);
BEGIN
EXECUTE IMMEDIATE 'TRUNCATE TABLE my_edges_no_ids';
for c in mycur loop
BEGIN
dbms_output.put_line('processing table_name[' || c.table_name || '] pk_list[' || c.pk_list || '] pk_count[' || c.pk_count || ']');
query_str:='select owner, table_name ' ||
'from all_tab_columns atc ' ||
'where owner = ''SYS'' ' ||
' AND column_name IN ('||c.pk_list||') ' ||
' AND table_name != '''||c.table_name||''' ' ||
'group by owner, table_name ' ||
'having count(*) = ' || c.pk_count || ' ' ||
'ORDER BY atc.table_name';
OPEN d FOR query_str;
LOOP
FETCH d INTO query_owner, query_table_name;
EXIT WHEN d%NOTFOUND;
-- process row here
--dbms_output.put_line('found it in [' || query_table_name || ']');
query_str1 := 'insert into my_edges_no_ids(source_node,target_node,edge_type,edge_label) ' ||
'VALUES('''||c.table_name||''','''||query_table_name||''',''Directed'','''||c.pretty_pk_list||''')';
dbms_output.put_line(query_str1);
EXECUTE IMMEDIATE query_str1;
END LOOP;
CLOSE d;
END;
end loop;
EXECUTE IMMEDIATE 'TRUNCATE TABLE my_nodes';
INSERT INTO my_nodes (id, label, name)
SELECT ROWNUM, sub1.node, sub1.node
FROM (SELECT sub.node
FROM (SELECT meni1.source_node node
FROM my_edges_no_ids meni1
UNION
SELECT meni2.target_node node
FROM my_edges_no_ids meni2) sub
ORDER BY sub.node) sub1;
EXECUTE IMMEDIATE 'TRUNCATE TABLE my_edges';
INSERT INTO my_edges (source, target, type, label)
SELECT src.id, trg.id, meni.edge_type, meni.edge_label
FROM my_edges_no_ids meni
JOIN my_nodes src ON src.label = meni.source_node
JOIN my_nodes trg ON trg.label = meni.target_node;
end;
TRUNCATE TABLE my_edges_no_ids;
DROP TABLE my_edges_no_ids;
CREATE GLOBAL TEMPORARY TABLE my_edges_no_ids(
source_node VARCHAR2(30),
target_node VARCHAR2(30),
edge_type VARCHAR2(20),
edge_label VARCHAR2(500)
)
ON COMMIT PRESERVE ROWS;
--------------------------------------
TRUNCATE TABLE my_nodes;
DROP TABLE my_nodes;
CREATE GLOBAL TEMPORARY TABLE my_nodes(
id NUMBER,
label VARCHAR2(30),
name VARCHAR2(30)
)
ON COMMIT PRESERVE ROWS;
--------------------------------------
TRUNCATE TABLE my_edges;
DROP TABLE my_edges;
CREATE GLOBAL TEMPORARY TABLE my_edges(
source NUMBER,
target NUMBER,
type VARCHAR2(20),
label VARCHAR2(500)
)
ON COMMIT PRESERVE ROWS;
--------------------------------------
SELECT * FROM my_edges_no_ids a;
SELECT * FROM my_nodes;
SELECT * FROM my_edges;