Thursday, 8 November 2012

XML API Research


1. xudml1.xsd file contains the grammar for the  structure of repository file:

C:\Oracle\Middleware\Oracle_BI1\bifoundation\server\bin\xudml1.xsd

An XML schema is a description of an XML document in terms of the constraint, content and the structure of an XML document. An example would be DTD (Document type definition), XSD. XSD is also known as XML Schema.

Well formed versus Valid

All xml documents must be well formed however it's not mandatory that they are valid as well. An xml document will be valid if it conforms to its schema definition.

commonly used terms while describing xml docs: tags, attribute, element.

Information about xml schema elements: w3schools

2. biserverxmlgen generates XML from an existing RPD


Before running the XML utilities, you must first run bi-init.cmd (or bi-init.sh on UNIX) to launch a command prompt or shell window that is initialized to your Oracle instance. You can find this utility in:
ORACLE_INSTANCE/bifoundation/OracleBIApplication/coreapplication/setup

C:\Oracle\Middleware\instances\instance1\bifoundation\OracleBIApplication\coreapplication\setup

e.g.

biserverxmlgen -R D:\xml_api\PS1_14Sept2012.rpd -P Admin123 -O   D:\xml_api\PS1_14Sept2012.xml -8

Sunday, 12 August 2012

OBIEE Learnings


1. OBIEE has tnsnames.ora file to connect to orcl datasources. Here is a sample location on tnsnames.ora in    OBIEE 11G:

C:\MW_HOME\Oracle_BI1\network\admin

 In order to be able to connect to Oracle data source to import Physical layer objects in Admin tool, this file should have an entry for the datasource:


(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1522)) (CONNECT_DATA =(SERVER = DEDICATED)  (SERVICE_NAME = orcl.domain_name) ) )

ORCL should be used in Admin tool as a data source, with appropriate schema name and password for importing physical table metatdata.

A sample tnsnames.ora entry that can be used in rpd directly:


(DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 1522)) (CONNECT_DATA =(SERVER = DEDICATED)  (SERVICE_NAME = orcl.domain_name) ) )

there shouldn't be any newline character in the above string.

1.1 OBIEE log location:

   nqquery.log file contains the queries generated by OBIEE, it's location is:

C:\OBI_MW_HOME\instances\instance1\diagnostics\logs\OracleBIServerComponent\coreapplication_obis1\

2. Aliases are created in Physical layer of rpd to prevent extraneous joins. The fk physical joins which are defined for an alias are not present in the source physical table. for e.g. FACT_D1_ORDERS2 is an alias of D1_ORDERS2. FK physical joins are defined in this alias to other three dimension aliases:


These fk joins are not visible in the source table for this, i.e. D1_ORDERS2:

As shown above these fk physical joins are not displayed in the source physical table.

3. In the BMM layer the logical tables and columns are renamed to more sensible real worldly names rather than names in the physical layer which are more of db friendly. Each logical table has one or more logical table source.
4. Logical table keys:  Each logical dimension table must have a logical key, for a business model to be valid.
Logical keys can be composed of one or more logical columns, the logical key defines the most detail level of information of any source in the logical table. If the logical tables are created by dragging and dropping a physical table from physical layer into a business model, they'll be automatically created.
5. For a valid business model logical tables must be joined with logical joins. Cardinality relationships between logical tables is capture by logical joins. Logical Facts will always be at the "many" end of these cardinal relationships. When a query is sent to OBI server, it figures out how to form physical query by examining how the logical model is constructed.





Sunday, 29 July 2012

MS EXCEL

Problem 1: How to check if the values in colum A match with any of the values in column B?

Solution: Write this formula for column C =MATCH(A1, B$1:B$700,0) and drag down to 700th row.

param1: is the value which should be searched,
param2: is the range of cells in which it'll be searched for
param3: looks for exact match

Return value of this function is the index of the first cell in column B where the matched value is found.


More information about this is here

Tuesday, 19 June 2012

Fledgling PL/SQL Devleoper

PL/SQL : Procedure Language for SQL

nice step by step guide for running procedure from sql developer is here


PL/SQL code blocks comprise of statements. Each statement ends with a semi colon. PL/SQL code blocks are followed by a forward slash / in the first position of the following line.  This causes the code block statements to be executed. The End keyword is the only PL/SQL code block keyword that is followed by a semi colon.

PL/SQL procedures are wrapped in Packages for reusability.

Anonymous block:  PL/SQL block that is contained in an external file are called anonymous block procedures and can be executed from the SQL prompt by using START keyword, which causes SQL*Plus to load, compile and execute the code block in the file.
e.g.

START c:\plsql\code.sql

Note: to start sqlplus type sqlplus from command prompt.


PL/SQL  Code Block has following sections:

0. Header: This is optional first block. It is used to identify the type of code block and its name. The code block types are:
          a. Anonymous procedure
          b. Named procedure
          c. function

1. Declaration(optional)-  This is an optional section and contains the names of the local objects that will be        used in the code block. These include variables, cursor definitions and exceptions. This section begins with the keyword DECLARE.

2. Executable(mandatory): This section starts with keyword BEGIN and is terminated by keyword END. This is the only mandatory section and contains the statements that will be executed i.e.  SQL statements, DML statements, procedures(PL/SQL code blocks returning a value) , and build in subprograms.  This section starts with the word BEGIN.

3. Exception(optional): An optional section to handle any errors that occur during the execution of the statements and commands in the executable section.

Code block is terminated by the END keyword, the only construct keyword that is followed by a semicolon.  Executable is the only required section, which means the code block must have BEGIN and END. The code block is executed by /

declare
   local_variable varchar2(30);
begin
   select 'Num of employees'|| to_char(count(last_name), '999')
       into local_variable
  from employee;
 dbms_output.put_line(local_variable);
exception
   when others then dbms_output.put_line('ERROR OCCURED');
end;
/

The select command is used to compute the number of employees in the database, concatenate this value with another string and assign the value to variable defined in declare section.  The final section is the output section. It contains a catchall error handler, the statement which is executed when a database error occurs and when an error handler is not available.


Variable Definition:


Variables are defined as :

Variable_Name   Datatype(precision) ;

in Declaration section. Oracle treats variable definition as any other statement so a semi colon is mandatory at the end. Value may be assigned to a variable during definition as well. The variable may also be constrained.

 Character Definitions:

 Variables that contain alphanumeric values are normally defined as CHAR or VARCHAR2.
VARCHAR2 : contains null spaces in each position that a character does not occupy. Defining length is mandatory.  Maximum length can be 32767.
CHAR: contain white space in the unoccupied positions.  Whitespace and null are not the same and are not
comparable.  Length when not specified is 1. maximum can be 32767.

Note: PL/SQL length of CHAR and VARCHAR2 datatypes is larger than the length allowed in the Oracle database.

Numeric Definitions:

Numeric data definitions can include two parameters: precision and scale. Precision defines overall length of the value. Scale determines the number of digits to the left or the right of the decimal point. The range of the scale is -84 to 127.  When a scale is specified, rounding will occur at the end. Following rules apply:

  a. Positive scale definitions cause rounding to the right of the decimal point.
  b. Negative scale definitions cause rounding to the left of the decimal point.
  c. Zero scale definitions cause rounding to the nearest whole number.

The default precision of a number is 38.

e.g. age  integer(3); :  age must be a whole number less than 999
 galloons number(3);  the scale is not specified, the value can contain upto three places to the left of decimal or upto three positions. The overall length of the value can't be more than three positions.

salary  number(8, 2) ;  The overall length of value can't be more than 8 positions including decimal values. At max the variable will have two positions to the left of decimal.

Other datatypes: Several other datatypes including Boolean, Date, Exception can be used.

Constrained  Definitions:
 Constraints can be applied on the variables defined in the code block. Two common constraints are :

CONSTANT:    ensures that the value is not changed after a value is initially assigned to the variable.  If a statement tries to change the variable value, an error will occur.

NOT NULL:  ensures that the variable will always contain a not null value. If a statement tries to assign a null value to the variable, an error will occur.

e.g

febonaci  constant number(9,8): =12.333223;
resonate not null date:='08-APR-53'


Aggregate and PL/SQL Record Definitions

Aggregate variable definition is based on a database of PL/SQL object. It has following advantages:

a. The developer can define a variable with the same data specifications as a table column or cursor variable,
    without actually knowing the specifications.
b. The developers can set up an array of variables for a cursor or table record with one statement. The
    variables will have the same specifications as the table or cursor variables.

The first aggregate definition tool is %TYPE which allows developer to define a variable with the same specifications as the indicated table or cursor variable. Syntax:

VARIABLE_NAME  table_cursor_name.columns_name%type;

e.g.
lname    employee.last_name%type;

the second keyword is used to establish an array of variables based on the columns in a cursor or table. This aggregate is called a PL/SQL record. The keyword will create a variable for each column in the identified table/cursor.  Each variable in the array will have the same name and specification as its counterpart in the table/cursor.

syntax: ARRAY_NAME     table/cursor_name%rowtype;

Here is an e.g. for aggregate array definition:
 DEPT_VAR department%rowtype;

variables are acccessed as :

DEPT_VAR.department
DEPT_VAR.department_name

Assigning values to Variables


PL/SQL has two ways to assign values to variables:

:= assignment operator           assigns the argument of the left of the operator to the argument of variable on
                                             the right of the sign.

INTO                                   keyword is used in a SELECT or FETCH statement. When used in a
                                            SELECT statement, it assigns the values in the SELECT clause to the variables
                                            following the INTO keyword. When used with the FETCH statement, it
                                            assigns the cursor values to the variables that follow the INTO keyword.

When using the INTO keyword in a SELECT clause, be sure that only one record is retrieved. If more than one record is retrieved an, error will occur.






















Saturday, 19 May 2012

SQLs




My Important SQLs:


--Creating a db link



create database link sbl78_link connect  to ora321 identified by  ora321
using   '(DESCRIPTION = 
     (ADDRESS_LIST = 
(ADDRESS = (PROTOCOL = TCP) (HOST = hostname.domainname)(PORT = 1521))) 
                                (CONNECT_DATA = 
(SERVICE_NAME = service_name)
                                )
)'
              
 
 
 
 Querying the remote db: 
 
 select * from remote_table@dblink_name
 
 


2. SELECT
ODI_TABLE.TABLE_NAME,
ODI_TABLE.COLUMN_NAME,
case when ODI_table.longc = siebel.data_length/4 then "pass" else "fail" end result1,
case when ODI_table.dummy_col is NULL then "table not exist" else "pass" end result2 

FROM 
(select snp_table.table_name TABLE_NAME, snp_col.col_name COLUMN_NAME, snp_col.source_dt, snp_col.longc, 1 dummy_col from snp_col, snp_table, snp_model where 
snp_col.i_table= snp_table.i_table and snp_table.i_mod = snp_model.i_mod
and snp_model.i_mod= 17001 and (snp_col.SOURCE_DT ='VARCHAR2' OR snp_col.SOURCE_DT = 'CHAR')
and snp_table.table_name like 'S%') ODI_TABLE

left outer join SBL78_LINK@ALL_TAB_COLUMNS siebel

ON ODI_TABLE.TABLE_NAME= siebel.TABLE_NAME AND ODI_TABLE.COLUMN_NAME = SIEBEL.COLUMN_NAME 1


3. alter tablespace users add datafile 'c:\extend3.dbf' size 15120M

alter database tempfile 'C:\app\fakebideveloper\oradata\orcl\TEMP02.dbf' resize 2048M

4. data validation:



select * from all_tab_columns ALL_TABLES where owner = 'SCHEMA NAME'  AND TABLE_NAME ='TABLE NAME'
AND COLUMN_NAME LIKE '%_WID' ORDER BY COLUMN_NAME

5.  When max is applied on two date columns then the max works perfectly and returns only one row with max values of each date column , no matter whether the two dates are max for the same record.

create table Test (
start_dt date,
end_dt date,
Amt number(10)
)


insert into Test
values (TO_DATE('01/01/1980', 'DD/MM/YYYY'), TO_DATE('12/03/2050', 'DD/MM/YYYY') , 5000)

--Insert another row with changed dates


SELECT MAX(START_DT), MAX(END_DT) FROM TEST

6. Importing db dump  files exported with expdp :
     a. First create user for the dmp file to be imported and define a directory and grant read, write privilege
     on directory created to the user just created :


CREATE USER supplier2 IDENTIFIED BY supplier2 DEFAULT TABLESPACE users;
ALTER USER supplier2 IDENTIFIED BY supplier2 ACCOUNT UNLOCK;
GRANT DBA to supplier2;

CREATE USER obiee_navteq IDENTIFIED BY obiee_navteq DEFAULT TABLESPACE users;
ALTER USER obiee_navteq IDENTIFIED BY obiee_navteq ACCOUNT UNLOCK;
GRANT DBA to obiee_navteq;

CREATE OR REPLACE DIRECTORY datapump as 'd:\datapump';
GRANT READ, WRITE ON DIRECTORY datapump to supplier2;
GRANT READ, WRITE ON DIRECTORY datapump to obiee_navteq;

ALTER SYSTEM SET SEC_CASE_SENSITIVE_LOGON = FALSE;

exit;


2. Place dump file in the DIRECTORY location specified above:

3. Go to command prompt and enter following:

impdp supplier2/supplier2@XE schemas=supplier2 directory=datapump dumpfile=supplier2.dmp logfile=impdpsupplier2.log

hit enter.

Troubleshooting:
  If after step 3 an error is displayed :
UDI-12154: operation generated ORACLE error 12154
ORA-12154: TNS:could not resolve the connect identifier specified

Apparently something is wrong with tnsnames.ora.

In that case :
   a. Enter sqlplus on command prompt, press enter. SQLPlus command prompt opens up.
   b. Prompt will ask username: username/pwd@sid
      e.g. local_source/oracle@xe

if you are not able to connect tnsnames.ora either doesn't have entry for sid xe or some mismatching quote or  character is causing issue in tnsnames.ora. Take a backup of tnsnames.ora and try editing it.

6.1 Exporting a database schema using expdp(Oracle 10g onwards):


  If your database from which you want to export a schema is on a machine which has obiee installed, chances are your ORACLE_HOME environment variable is pointing to OBIEE's oracle home. Follow these steps to point it temporarily to your database:

    a. Open command prompt. type expdp( the location of expdp should be set in PATH in order to invoke it from command line) . If you see an error:

UDE-00013: Message 13 not found; No message file for product=RDBMS, facility=

UDE-00019: You may need to set ORACLE_HOME to your Oracle software directory

    go to step b.
  b. set ORACLE_HOME to C:\app\fakebideveloper\product\11.1.0\db_1

  set ORACLE_HOME=C:\app\fakebideveloper\product\11.1.0\db_1
  in the command prompt.
  c. Login to the database with a user other than the one you want to export(EBS1213_DW_1) and execute following:


CREATE OR REPLACE DIRECTORY dump_dir AS 'c:\datapump';
GRANT READ, WRITE ON DIRECTORY dump_dir TO EBS1213_DW_1;

 d. go to the DB server from where you've to export a schema, and execute the following:

expdp EBS1213_DW_1/oracle@orcl schemas=EBS1213_DW_1 directory=DUMP_DIR dumpfile=EBS1213_DW_1.dmp logfile=expdpEBS1213_DW_1.log


Note: ORACLE_HOME must be set already and the physical directory as mentioned in b. above must already exist in DB server machine.
 
6.2 Exporting a schema from a remote DB server:

    a. Create an entry in your tnsnames.ora(in the client machine):



ORCL_QA =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = xxxxxxxxsvdb.us.oracle.com)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = qa26a)
    )
  )


  b. Create a db link to the remote server:

create public database link remoteloc connect to ORA26109 identified by ORA26109 using 'ORCL_QA';

  c.

 expdp remote_user/remote_pwd directory=dumpdir logfile=impi.log network_link=remoteloc schemas=scott dumpfile=newscot.dmp



Here are some important metadata tables accessible to Users in Oracle db:


 DBA_ALL_TABLES         ALL_ALL_TABLES       USER_ALL_TABLES
 DBA_PARTIAL_DROP_TABS  ALL_PARTIAL_DROP_TABS USER_PARTIAL_DROP_TABS
 DBA_PART_TABLES        ALL_PART_TABLES       USER_PART_TABLES 
 DBA_TABLES             ALL_TABLES            USER_TABLES          TAB 
 DBA_TABLESPACES                              USER_TABLESPACES 
 DBA_TAB_COLUMNS      ALL_TAB_COLUMNS         USER_TAB_COLUMNS 
 DBA_TAB_COLS         ALL_TAB_COLS            USER_TAB_COLS 
 DBA_TAB_COMMENTS     ALL_TAB_COMMENTS        USER_TAB_COMMENTS 
 DBA_TAB_HISTOGRAMS   ALL_TAB_HISTOGRAMS      USER_TAB_HISTOGRAMS 
 DBA_TAB_MODIFICATIONS  ALL_TAB_MODIFICATIONS USER_TAB_MODIFICATIONS 
 DBA_TAB_PARTITIONS   ALL_TAB_PARTITIONS      USER_TAB_PARTITIONS

7. Changing NLS_DATE_FORMAT in Oracle:

The Oracle nls_date_format environmental variable does much more than control the date display within Oracle.  Changing the value of nls_date_format will change the way that the data is displayed in Oracle SQL*Plus:
ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY MM DD'; select sysdate from dual; ALTER SESSION SET NLS_DATE_FORMAT = 'HH24:MI:SS'; select sysdate from dual;

Oracle SQL:

Pseudocolumns: Oracle makes available several Phantom columns, known as pseudocolumns, that do not exist in any tables. Rather they are values visible during query execution that can be helpful in certain situations. 

e.g.
SYSDATE current date and time  ROWNUM sequence number assigned to retrieved rows  ROWID unique identifier for a row  UID number associated with a user  USER userid of current user 
Level is also a pseudo column .
Usage:

select user,  sysdate, cust_nbr from customer;

Dealing with hierarchical data:


Let's see Oracle sql functionality to deal with hierarchical data. Here is Employee table which stores parent child relationship in between employee and his manager as follows:







This hierarchical data can be represented in a chart as :

Problem 1: finding out the root node of this tree:
Solution: select lname from employee where emp_manager_id is null;

Problem 2: finding out the leaf nodes:
Solution: select lname from employee e where e.emp_id not in (select manager_emp_id from employee emp where emp.manager_emp_id is not null );

Problem3: Flattening the parent -child relationship so that in a single row we can see the entire reporting line.
Solution:  This requires knowledge of number of levels in the hierarchy. We'll have to do a series of left outer joins.


select e1.lname emp_name, e2.lname mgr_name, e3.lname, e4.lname from employee e1 left outer join employee e2 on e1.manager_emp_id =e2.emp_id
left outer join employee e3 on e2.manager_emp_id = e3.emp_id left outer join employee e4 on e3.manager_emp_id = e4.emp_id;




For this hierarchy to be consumed by OBIEE10 g the leaf node is repeated upto the last level.



Note: when you say flattened parent-child hierarchy, each row represents path from a node upto the root node, including the path from the root node to itself.

Problem 4: Flattening a parent child relationship to see each manager with it's parent. This is a more generic problem, since the #3 was more from a standpoint of OBIEE 10 g consume able form.


select e1.lname, e2.lname, e3.lname, e4.lname from employee e1 left outer join employee e2 on e1.emp_id = e2.manager_emp_id left outer join employee e3 on e2.emp_id = e3.manager_emp_id left outer join employee e4 on e3.emp_id = e4.manager_emp_id where e1.manager_emp_id is null;



note: if one just wants to see the hierarchy for visual inspection, the query provided as a solution for prob#4 can be used. However if the output of the query has to be in a form to be consumed by OBIEE 10g then the solution to problem#4 is more suitable.


Start With, Connect By and Prior in Oracle SQL

The information in a table can be retrieved in a hierarchical form with the help of Select statement's Start With .... Connect By clause.

[ [START WITH condition1 ] CONNECT BY condition2]

START WITH condition1:  condition1 specifies the way to identify the root node of a hierarchy. All rows that satisfy the condition will be considered roots. If you don't specify a START WITH clause all rows are considered root rows(which is usually not desirable). A subquery can be included in Condition1.

CONNECT BY Condition2:  Specifies the relationship between. parent rows and child rows in the hierarchy. The relationship is expressed as a comparison expression, where columns from the current row are compared to corresponding parent columns. condition2 must contain the PRIOR operator, which is used to identify columns from the parent row. Condition2 can't contain a subquery.


PRIOR is an oracle sql operator used only in conjunction with Hierarchical queries. When PRIOR is used in a Connect By condition the expression following PRIOR is evaluated for the Parent Row of the current row in the query.

e.g.

select lname, emp_id, manager_emp_id from employee
START WITH manager_emp_id is null
CONNECT BY PRIOR emp_id = manager_emp_id;



As shown above this lists the hierarchical information for visual inspection, compare this information with the diagram of the hierarchy.

Note: the PRIOR keyword doesn't need be listed first. for e.g. the query above can be written as:


select lname, emp_id, manager_emp_id from employee
START WITH manager_emp_id is null
CONNECT BY  manager_emp_id = PRIOR emp_id ;

both of them return the same result.

Given that Condition1 specifies the starting point of the hierarchy(or root node), we can get the relative hierarchical position, by making root as the node who's hierarchy we want to see:

e.g.


select lname, emp_id, manager_emp_id from employee
START WITH manager_emp_id i = 'BLAKE'
CONNECT BY  manager_emp_id = PRIOR emp_id ;


This gives all the direct/indirect subordinates of BLAKE



We can also have a subquery for Condition1,  suppose we want to find out the employees reporting directly/indirectly to the oldest employee:

select lname, emp_id, manager_emp_id from employee
START WITH hire_date = (Select min(hire_date) from employee)
CONNECT BY PRIOR  emp_id =manager_emp_id;


Note: beware that the subquery should return only one column and row. If it returns more than one row then you'll have to use IN operator instead of = , but  in that case there will be multiple roots.

LOOP/CYCLE in hierarchical data:

If  a row is both parent and child of another row then it's a cyclic data. Since the Connect By condition specifies a parent child relationship, it can't contain a loop.

More than One column to define Parent- Child relationship:

When a parent child relationship involves more than one column, you'll have to use PRIOR operator before each column(which should be picked up for parent record).

e.g. select * from assembly
START WITH parent_assembly_type is null and parent_assembly_id is null
CONNECT BY parent_assembly_type = PRIOR assembly_type
and parent_assembly_id = PRIOR assembly_id;

The LEVEL Pseudocolumn


Oracle provides pseudocolumn LEVEL to represent levels in a hierarchical tree. Whenever we use START WITH ... CONNECT BY clauses in a hierarchical query, you can use the pseudocolumn LEVEL to return the level number for reach row returned by the query.

for e.g.

select level, lname, emp_id, manager_emp_id
from employee
START WITH manager_emp_id is null
CONNECT BY manager_emp_id = PRIOR emp_id;


as shown in the resultset, each employee is now associated with a number representing it's level in the organisation.

Problem 5: Complex Hierarchy Operations can be performed much more easily than using standard ANSI sql:

 a. Finding the Number of Levels in the hierarchy:

SELECT MAX(LEVEL)
FROM
EMPLOYEE
START WITH manager_emp_id is null
CONNECT BY PRIOR emp_id = manager_emp_id;


o/p: 4

b. Deterimine the number of employees at each level, group the results by Level and count number of employees in each distinct group:

SELECT LEVEL, Count(emp_id)
from Employee
START WITH manager_emp_id is null
CONNECT BY PRIOR emp_id = manager_emp_id
GROUP BY LEVEL;

;




Group Operations

Questions such as: 
1. who is the highest paid executive in Operations department. 
2. What was the sales by Region in the year 2018. 

require application of group functions which aggregate data over a set of rows returning a single value. 

Aggregate Function syntax:   aggregate_function([Distinct | All] expression)

aggregate_function: is the name of the function e.g. SUM, COUNT, MAX, MIN

DISTINCT: specifies that the aggregate function should consider only distinct values of the argument  
                   expression. 
ALL:    specifies the aggregate function should consider all values including all duplicate values of the
            argument expression.  The default is ALL.

expression: specifies a column or any other expression, on which you want to perform the aggregation.

Aggregate functions and NULLs


All aggregate functions ignore NULLs except COUNT(*) and GROUPING.

Count(*) counts rows and not values since the concept of NULL does not apply to rows.
Note that Count() does not ignore NULLs.

Question: What is the result of Count(1)  or Count(2) or Count(3) ?

select Count(1), Count(2), Count(3) from Employee;

Answer: The parameter to function Count is an expression. The function Count, like any other Aggregate function,  counts the Not Null values of this expression for each row. So 1 will be not null for first row, second row, third row so on and so forth for all rows. So all these three expressions will return the row count of the table.

following three SQLs:



select avg(SALE_PRICE) FROM CUST_ORDER;

SELECT SUM(SALE_PRICE)/COUNT(*) FROM CUST_ORDER;

SELECT SUM(SALE_PRICE)/COUNT(SALE_PRICE) FROM CUST_ORDER;

first and third will give the same result while second will give the wrong result as COUNT(*) will also count the rows where SALE_PRICE is NULL.

In some cases you may want an average to be taken over all the rows in a table, not just the rows with non null values for the column in question. In those situations you'll have to use NVL function within the AVG function call to assign 0 (or some other useful value) to the column in place of any NULL values.


SELECT AVG(NVL(SALE_PRICE,0)) FROM CUST_ORDER

result:39.4

all the rows are considered for the average computation and the rows with NULL value for sale_price are assumed to have a 0 value for that column.

Aggregate functions and DISTINCT and ALL

Most Aggregate functions allow the usage of DISTINCT and ALL along with the expression argument. DISTINCT allows you to disregard duplicate values of the expression whereas ALL causes duplicate expression values to be included in the result.
e.g.
select count(distinct cust_nbr), count(cust_nbr), count(all cust_NBR) from cust_order;

result: 4 , 20, 20

Note: that the default ALL takes into account the duplicate values of the expression but does not take into account the NULL values.

e.g.
select count(sale_price) , count(ALL sale_price), count(distinct sale_price) from cust_order;

Result:  14, 14, 4

There are totally 20 records 6 have null values for SALE_PRICE, so the result above.

Since ALL is the default, you can explicitly use ALL with every aggregate function. However, the aggregate functions that take more than one argument as input don't allow the sue of DISTINCT. These include CORR, COVAR_POP, COVAR_SAMP, and all the linear regression functions.

In addition, some functions that take only one argument don't allow the use of DISTINCT. This category includes STTDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, and GROUPING.


Group By Clause


Usually Aggregation need to be done by first grouping the data in different sets and then getting one aggregated value for each group. Group By clause fills this requirement. 

select cust_nbr, count(order_nbr) from cust_order
group by cust_nbr;

a. All non aggregate expressions in Select list should be part of Group By clause. 
b. Aggregate functions not allowed in Group By clause.
c. Constants can be omitted from Group By clause. 

e.g. Select 'Customer', cust_nbr, Count(order_nbr)
from Cust_order
Group By 
Cust_Nbr


or

SELECT 'Customer' , cust_nbr, Count(order_nbr)
from Cust_order
Group by
Cust_Nbr, 'Customer';

both are valid(reason being, since it is a constant for all the rows it can no way affect the way data is grouped).

d. Scalar functions may be grouped by their underlying column.


SELECT  SUBSTR(lname, 1,1 ) , COUNT(*) FROM EMPLOYEE GROUP BY
LNAME;

This query is syntactically correct and allowed. However if you look at the result set the Grouping happens based LNAME. So if there are two LNAME values  'ARMSTRONG' and 'AGNEL' there will be two records for each.  In case you want grouping to be done on the basis of just the first character, use SCALAR function in the Group By Clause as follows:

SELECT SUBSTR(lname, 1,1 ) , COUNT(*) FROM EMPLOYEE GROUP BY
SUBSTR(lname, 1,1);

In this case both 'ARMSTRONG', 'AGNEL' will fall in the same group.

e.  Concatenated columns may be grouped in either of two ways, both giving the same result.

SELECT MANAGER_EMP_ID|| JOB_ID, COUNT(*) from EMPLOYEE
GROUP BY  MANAGER_EMP_ID || JOB_ID;

SELECT MANAGER_EMP_ID||JOB_ID, COUNT(*) from EMPLOYEE
GROUP BY  MANAGER_EMP_ID, JOB_ID;

both the queries above give the same result.

f. Sometimes you may want to keep a non aggregate column in the Select list but not in the Group By clause.
for e.g. you want to see line number along with the summarised information. If you try following :

SELECT ROWNUM, CUST_NBR, COUNT(ORDER_NBR) FROM CUST_ORDER
GROUP BY CUST_NBR, ROW_NUM;

Note: ROWNUM is the pseudo column.

Since ROWNUM is in the Group By and each row will be assigned a new number so the result is grouped by ROWNUM. This is not what you wanted. So use following instead:

SELECT ROWNUM, V.* FROM
(SELECT CUST_NBR, COUNT(ORDER_NBR) FROM CUST_ORDER GROUP BY CUST_NBR) V;

g. You are not required to show your GROUP BY columns. However the result can confuse the user.

e.g. SELECT  COUNT(ORDER_NBR)  FROM CUST_ORDER GROUP BY CUST_NBR

the o/p of the query won't make sense to the user unless the user sees the query along with the output.
Therefore it is recommended that the non aggregate columns be included in select clause as well.

h. Group by clause will create a separate group for NULL values of the columns in the group by clause and aggregate values for it and present.

i. Group by clause when used with Where Clause requires the Where Clause to appear first. Oracle first filters out the records which don't satisfy the condition and then does a Group By.

SELECT CUST_NBR, COUNT(ORDER_NBR) FROM CUST_ORDER
WHERE SALE_PRICE >  25
GROUP BY CUST_NBR;
j.  HAVING Clause is closely associated with GROUP BY clause. HAVING Clause is used to put a filter on the Groups created by Group By clause. If a query has a HAVING clause along with a GROUP By clause, the result set will include only the groups that satisfy the condition specified in the HAVING clause.

SELECT CUST_NBR, COUNT(ORDER_NBR) FROM CUST_ORDER
GROUP BY CUST_NBR
HAVING CUST_NBR<6;

The output only includes customers with numbers below 6. That's because the HAVING clause specified CUST_NBR <  6 as a condition. Orders for all Customers were counted but only those groups that matched the specified HAVING condition were returned as the result.

Note: Having clause filters data after summarization whereas Where clause filters data before summarization, so the example above is not appropriate. Instead of Having clause, Where clause should be used in this example.

A more appropriate example will be following:

SELECT CUST_NBR, COUNT(ORDER_NBR)
FROM CUST_ORDER
GROUP BY CUST_NBR
HAVING COUNT(ORDER_NBR) > 2;

This is an appropriate use of HAVING clause because the value of Count(ORDER_NBR) is not known until grouping takes place.

Note: The syntax for the Having clause is similar to that of the WHERE clause. However there is one restriction on the conditions you can write in the HAVING clause.
A HAVING clause can only refer to an expression in the SELECT list or to an expression involving aggregate function.  If you specify a condition which is not in Select list or not an Aggregate function you'll get an error:

SELECT  CUST_NBR, COUNT(ORDER_NBR) 
FROM CUST_ORDER 
GROUP BY CUST_NBR
HAVING ORDER_DT < sysdate;

this will result in an error:

ORA-00979: not a group by expression.

However you can use an aggregate function in HAVING clause even if it's not present in Select list. 
(The reason for above constraint is very simple, before aggregation, Group by does grouping based on the non aggregate column in the Select(and Group By), after that aggregation happens and post aggregation HAVING clause filter is applied.)

Java Learnings

1. Never try to create two Iterators of the same underlying object:
for e.g

iterator1 = odiJoins.iterator()
iterator2= odiJoins.iterator()

the second iterator will never be able to manipulate objects.
use this instead.:

iterator1=odiJoins.iterator()

//processing using iterator1


iterator1 =odiJoins.iterator()

//again some processing using iterator1


2. Whenver iterator is used


always use


iterator.hasNext() for looping .



ODI SDK NOTES:

oracle.odi.core.persistence.transaction.support.TransactionCallbackWithoutResult
implements ITransactionCallback

TransactionCallbackWithoutResult is a convenient class for ITransactionCallback interface. It helps in defining a doInTransaction() without the need for a return statement. 

doInTransaction(ITransactionStatus pStatus) 

doInTransaction() is called by TransactionTemplate.execute(ITransactionCallback)


Interface ITransactionCallback

oracle.odi.core.persistence.transaction.support.ITransactionCallback

Callback interface for transactional code. Used with TransactionTemplate's execute method, often as anonymous class within a method implementation.

java.lang.object doInTransaction(ITransactionStatus pStatus)

gets called by TransactionTemplate.execute(ITransactionCallbac) within a transactional context does not need to care about the transaction itself. This method can return an ODI entity or a collection of ODI entities created within the transaction.

Class TransactionTemplate

oracle.odi.core.persistence.transaction.support.TransactionTemplate extends DefaultTransactionDefinition
All Implemented Interfaces:
ITransactionDefinition
Class TransactinTemplate  simplifies programmatic transaction demarcation and transaction exception handling against an ITransactionManager. The method execute(ITransactionCallBack) is key to this class, supporting transactional code that implements ITransactionCallback interface. 
Note: this template handles the transaction lifecycle and possible exceptions such that neither the ITransactionCallback implementation nor the calling code need to handle transactions. 

Constructor:

TransacitonTemplate(ITransactionManager)
Constructs a new transaction template using the given transaction manager.

TransactionTemplate(ITransactionManager, ITransactionDefinition)
Constructs a new transaction template using the given transaction manager

Method: execute(ITransactionCallback)
execute the action specified by the given callback object within a given transaction.
Returns a result object created by the callback of null.

Interface ITransactionManager



oracle.odi.core.persistence.transaction 

Transaction management interface. 

methods:

1. void commit(ITransactionStatus)

2. ITransactionStatus getTransaction(ITransactionDefinition )
gets the currently Active transaction or creates a new one according to the given transaction definition(containing a propagation behavior). If the given definition is null a default definition will be used (with default propagation behavior and isolation level). Parameters like isolation level and timeout will only be applied to new transactions and will be ignored when participating with active ones. 

It returns transaction status object representing the new or current transaction.

3. rollback(ITransactionStatus)
Perform a rollback of the given transaction.