Grid Control

From dbawiki
Jump to: navigation, search

Contents

Definitions[edit]

The 10g Enterprise Manager Grid Control includes the following Components:

This document describes the troubleshooting steps to be followed when there is a communication problem between the Oracle Management Service (OMS) and the Grid Agent.

Additional references:[edit]

  • Note 951076.1: How to Troubleshoot Communication From a Grid Agent to the Oracle Management Service (OMS) in 10g Enterprise Manager Grid Control?
  • Note 1089443.1: How to Troubleshoot Communication From the Grid Console (UI) Machine to the Oracle Management Service (OMS) in 10g Enterprise Manager Grid Control?
  • Note 1089693.1: How to Troubleshoot Communication From the Oracle Management Service (OMS) to the Grid Control Repository Database in 10g Enterprise Manager Grid Control?


  • OMS - Management Service - This is responsible for communicating with the user via a GUI and communicating with the OMR
  • OMR - Management Repository - This is a collection of tables owned by the sysman schema that stores all the data collected by the OMA
  • OMA - Management Agent - This is a perl program that runs on the database host (one for all databases on the host) and uploads data to the OMS for storage in the OMR

A useful set of emcli commands[edit]

Login to EM on OMS[edit]

emcli login -username=sysman [-password=<sysmanpassword>]

logout of EM on OMS[edit]

emcli logout

Sync imcli with OMS[edit]

emcli sync

List promoted targets[edit]

emcli get_targets

Delete a specific target[edit]

emcli delete_target -name="demo" -type="database"

Delete an agent and its targets[edit]

emcli delete_target -name="xxxlgc-prod2.mydomain.com:3872" -type="oracle_emd" -delete_monitored_targets

Follow a plugin deployment (on the OMS / on an agent)[edit]

Oracle Database plugin

emcli get_plugin_deployment_status -plugin_id=oracle.sysman.db

Oracle Fusion Middleware plugin

emcli get_plugin_deployment_status -plugin_id=oracle.sysman.emas

My Oracle Support plugin

emcli get_plugin_deployment_status -plugin_id=oracle.sysman.mos

Import an update (for example: a plugin update) into the software library[edit]

emcli import_update -file="p14018177_112000_Generic.zip" -omslocal

Deploy a plugin on the OMS[edit]

emcli deploy_plugin_on_server -plugin=oracle.sysman.db -sys_password=XXXXX

Deploy a plugin on EM agent(s)[edit]

emcli deploy_plugin_on_agent -plugin="oracle.sysman.db"  -agent_names="xxxdb-prod1.mydomain.com:3872;xxxdb-prod2.mydomain.com:3872"

List available agents in the library[edit]

emcli get_supported_platforms

Download an agent from the library (used for agentDeploy.sh script method)[edit]

emcli get_agentimage -destination=/home/oracle -platform="Microsoft Windows x64 (64-bit)" -version="12.1.0.1.0"

Set monitoring credentials for a specific target (example given for an Oracle database instance)[edit]

emcli set_credential -target_type=oracle_database -target_name="prod1" -credential_set=DBCredsMonitoring -user=sysman -column="Role:SYSDBA;UserName:sys;password:XXXXX" -monitoring

Uninstall the agent oracle home that registered with inventory[edit]

If you are getting "agent home is already registered with the inventory", it means there is an entry for this home in OraInventory.xml already and it will need removing before installing to the same directory.
Agent Homes are stored in /etc/oragchomelist (Linux, AIX) and /var/opt/oracle/oragchomelist (Solaris).
Removing the entry from oragchomelist makes no difference.
It's in the Oracle Inventory file – its location can be found in /etc/oraInst.loc (Linux, AIX) and /var/opt/oracle/oraInst.loc (Solaris).

ORACLE_HOME="/oracle/cloud/core/12.1.0.1.0"
$ORACLE_HOME/oui/bin/runInstaller -silent -detachHome

Get rid of an agent that won’t go away in the Grid screens[edit]

This has the effect of removing everything related to the host in question!
On host where agent is running...

emctl stop agent

On Grid server...

sqlplus / as sysdba
exec mgmt_admin.cleanup_agent('<host>:<port>');
exit

On host where agent is running...

emctl start agent

Get rid of targets that won’t go away in the Grid screens:[edit]

exec mgmt_admin.delete_target('target_name','target_type’)

See mgmt_targets table in sysman schema for list of known targets.

Re-discover targets[edit]

cd $ORACLE_HOME/bin
./agentca -d

Agent Unreachable[edit]

  • Is the agent running?
From SQL*Plus:
select username,program from v$session where LOWER (program) like 'emagent%';

If no rows selected, the agent is not running. or

From Unix:
. oraenv
agent11g (how to see [[which databases are running on the machine]])
emctl status agent

ps -ef|grep [a]gent

There should be a handful. If only a few, kill them and restart the agent.

 emctl start agent

Check the agent log in $ORACLE_HOME/sysman/log

Start a host level blackout from command line (blackout has to already have been created via OEM)[edit]

This could be included in a shell script before patching, for example...

emctl start blackout server_maint -nodeLevel

Start a host level blackout for a certain duration[edit]

This starts a blackout from now until now + 8 hours

emctl start blackout server_maint -nodeLevel -d 08:00

Stop a blackout from command line[edit]

When patching is done?...

emctl stop blackout server_maint

Backend WLS or EM application seems to be down[edit]

To list targets known to an agent:[edit]

emctl config agent listtargets

It looks at the file $AGENT_HOME/sysman/emd/targets.xml

Manually add targets by editing this file and running:

emctl config agent addtargets $AGENT_HOME/sysman/emd/targets.xml

Check state and upload directories under $AGENT_HOME/sysman/emd for .err files

Change sysman password[edit]

This works until 11.1. For a more complete guide see metalink note 270516.1 or note 259379.1
For the DB Control Release 11.2 and higher, you need to set the environment variable ORACLE_UNQNAME to the value of the DB_UNIQUE_NAME database parameter.

Changing this password is easy as long as it is done correctly. It is a three step process.
Step 1. Change the password in the traditional manner.

SQL> alter user sysman identified by &new_password account unlock;

Step 2. Change the password in the emoms.properties file.

vi ${AGENT_HOME}/sysman/config/emoms.properties

Change the following 2 lines by entering the clear text password where the encrypted password is, and set True to False

oracle.sysman.eml.mntr.emdRepPwd=<new password here><br />
oracle.sysman.eml.mntr.emdRepPwdEncrypted=False<br />

Step 3. Restart the agent. Picks up the new password (and encrypts it)

emctl restart agent

Change dbsnmp password[edit]

Changing this password is easy as long as it is done correctly. It is a three step process.

Change the password in the traditional manner[edit]

SQL> alter user dbsnmp identified by &new_password account unlock;

Change the password in the targets.xml file[edit]

vi ${AGENT_HOME}/sysman/emd/targets.xml

Change the following line by entering the clear text password where the encrypted password is, and set True to False

<Property NAME=”password” VALUE=”<new password>” ENCRYPTED=”FALSE”/>

Restart the agent. Picks up the new password (and encrypts it)[edit]

emctl restart agent

Quick Checklist[edit]

borrowed from oraxprt.com

    Verify that the Agent on the target machine is up and running using:

    cd <AGENT_HOME>/bin
    emctl status agent


    The command should return output such as:

    emctl status agent
    Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
    Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
    —————————————————————
    Agent Version : 10.2.0.5.0
    OMS Version : 10.2.0.5.0
    Protocol Version : 10.2.0.5.0
    Agent Home : /home/oracle/OracleHomes/agent10g
    Agent binaries : /home/oracle/OracleHomes/agent10g
    Agent Process ID : 24465
    Parent Process ID : 24449
    Agent URL : https://agentmachine.domain:1830/emd/main/
    Repository URL : https://omsmachine.domain:1159/em/upload
    Started at : 2010-04-22 15:35:39
    Started by user : oracle
    ….
    —————————————————————


    which indicates that the Agent has started up fine. Also review the <AGENT_HOME>/sysman/log/emagent.nohup to ensure that the Agent is not re-starting frequently, which can affect the OMS to Agent communication.
    Refer to Note 548928.1: Enterprise Manager Grid Control Agent 10g, Process Control (Start, Stop & Status) Troubleshooting Guide
    Verify that the Agent’s URL, as seen in the Grid Console -> Setup -> Agent name page is the same as the value configured for the EMD_URL in the <AGENT_HOME>/sysman/config/emd.properties file.
    Refer Note 358953.1: What ports are used in communication between the Grid Control OMS and a Management Agent?

1. OMS / Agent Component level issues

    If the Agent machine is configured with DHCP and/or the IP address of the machine has recently changed, the OMS will not be able to communicate with the Agent.
    Refer Note 605009.1: Problem: OMS Cannot Communicate with Agent if IP Address of the Grid Agent Machine is Changed
    If there is a rogue emagent process on the target machine, then the OMS log/trace files could show communication errors. Refer Note 733879.1: Communication: OMS Log/Trace Files Show ‘ERROR eml.OMSHandshake processFailure’ for Agent Already Removed from Grid Console
    If the Agent is not capable of accepting incoming connection requests from the OMS, then the communication will fail. Refer Note 550452.1: Communication: OMS to Agent Communication Fails with ‘IOException in sending Request :: Broken pipe’
    Verify if there are multiple Agents installed / discovered from this machine. Refer to Note 435728.1: Communication: OMS to Agent Communication Fails with “Connection refused” if Multiple Agent Targets are Discovered

2. Hostname/IP Address Resolution Issues

If the OMS and Agent Components are located in separate machines, then the hostname/IP address resolution should work correctly from the OMS to the Agent machine.
Refer Note 763844.1: How to Verify the Hostname/IP Address Resolution Between the 10g Enterprise Manager Grid Control Components?

If the OMS is unable to resolve the hostname / IP address of the Agent machine, the <OMS_HOME>/sysman/log/emoms.trc will show errors such as below, when trying to access the Agent Homepage in the Grid Console:

2010-04-26 12:01:51,405 [EMUI_12_01_26_/console/admin/rep/emdConfig/emdTargetsMain$target=agentmachine.domain_3A3872$type=oracle*_emd] ERROR emdConfig.EmdConfigTargetsData getEmdUploadData.1732 – IOException in sending Request :: No route to host


To verify the Hostname / IP Address resolution from OMS to Agent machine, follow below steps:

    Collect the following details on the Agent machine:
        Hostname and the corresponding IP Address on which the Agent is configured:

        cd <AGENT_HOME>/bin
        emctl status agent
        Oracle Enterprise Manager 10g Release 5 Grid Control 10.2.0.5.0.
        Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved.
        —————————————————————
        Agent Version : 10.2.0.5.0
        OMS Version : 10.2.0.5.0
        Protocol Version : 10.2.0.5.0
        Agent Home : /home/oracle/OracleHomes/agent10g
        Agent binaries : /home/oracle/OracleHomes/agent10g
        Agent Process ID : 24465
        Parent Process ID : 24449
        Agent URL : https://agentmachine.domain:1830/emd/main/
        Repository URL : https://omsmachine.domain:1159/em/upload
        Started at : 2010-04-22 15:35:39
        Started by user : oracle


        The hostname is the one seen in the ‘Agent URL’ field.
        Obtain the IP address for this hostname using:

        ping <agentmachine.domain>

         
        Output of these commands:

        ping <IP adddress of the Agent machine>
        ping <hostname.domain of the Agent machine>
        ping <hostname of the Agent machine>
        nslookup <IP adddress of the Agent machine>
        nslookup <hostname.domain of the Agent machine>
        nslookup <hostname of the Agent machine>

         
    Collect the following details from the OMS machine:

    ping <IP adddress of the Agent machine>
    ping <hostname.domain of the Agent machine>
    ping <hostname of the Agent machine>
    nslookup <IP adddress of the Agent machine>
    nslookup <hostname.domain of the Agent machine>
    nslookup <hostname of the Agent machine>

Compare the output of the above commands on OMS and Agent machines – the outputs should match. If there is a difference or an error, please enlist the help of your System / Network Administrator to correct the configuration in the hosts file or the DNS.

Note:

1. If all the above commands work fine but the OMS still fails to communicate with the Agent, then stop and restart the OMS once to reset the TCP caching

<OMS_HOME>/opmn/bin
opmnctl stopall
<OMS_HOME>/opmn/bin>
opmnctl startall


2. If the Agent machine has multiple NIC cards / IP addresses, the Agent can be bound to a particular hostname / IP address combination using steps in:
Note 390444.1: How to: Tell the agent to listen to only one specific NIC Network Interface Card?

If the hostname / resolution works fine from the OMS to Agent but the communication still fails, then check for the presence of Firewall or Proxy Server in the setup using the steps below.

3. Firewall Setup / Proxy Server Issues

For details about configuring the Firewall and using the Proxy Server for the EM components, refer
Note 1088393.1: How to Verify the Communication Between the 10g Enterprise Manager Grid Control Components via Firewall/Proxy?

    If the Agent port is blocked, then the <OMS_HOME>/sysman/log/emoms.trc will show:

    2008-12-01 11:21:25,535[EMUI_11_17_40_/console/admin/rep/emdConfig/emdTargetsMain$target=agentmachine.domain_3A3872$type=oracle*_emd] ERROR emdConfig.EmdConfigTargetsData getEmdTargetsList.1767 – CommException:
    Unable to get list of targets from emd-getEmdTargetsList()
    2008-12-01 11:21:25,541 [EMUI_11_17_40_/console/admin/rep/emdConfig/emdTargetsMain$target=agentmachine.domain_3A3872$type=oracle*_emd] ERROR emdConfig.EmdConfigTargetsData getEmdTargetsList.1769 - Connection timed out oracle.sysman.emSDK.emd.comm.CommException: Connection timed out
    The following error is displayed when trying to look at the Targets -> Agent Host -> Performance page:

    An error has occurred!

    Unable to obtain data for target solaris.oracle.com. The target may be down. Switching to the last 24 hrs view

     
    Incorrect Proxy server configuration at the OMS side, can cause problems described in
    Note 395717.1: Communication: OMS to Agent Communication Fails With ‘Cannot Establish Proxy Connection’ Due to Proxy-Related Settings

To verify the communication between OMS to Agent machine, when Firewall / Proxy server is in use:

    Identify the Agent port and URL using the steps in
    Note 358953.1: What ports are used in communication between the Grid Control OMS and a Management Agent?
    Test the connectivity to the Agent URL from the OMS machine, using one of the following methods:
        Open a web-browser on the OMS machine and try to access these URL’s:

        http://agentmachine.domain:agentport/em/upload
        OR
        https://agentmachine.domain:agentport/em/upload


        The URL must return an output similar to:

        EMAgent10.1.0.2.0
        Congratulations, EMAgent is working!

         
        Use telnet

        telnet agentmachine.domain <agent port>


        Sample output:

        telnet agentmachine.domain 3872
        Trying 20.20.20.20…
        Connected to agentmachine.domain.
        Escape character is ‘^]’.


        If the access to the port is blocked due to a firewall, then the above command will fail with:

        telnet agentmachine.domain 3872
        Trying 20.20.20.20…
        telnet: connect to address 20.20.20.20: Connection refused

         
        Use wget

        wget <agent http url>
        OR
        wget –no-check-certificate <agent https url>

If any of the above commands fail, please contact your Network Administrator to determine if there is a Firewall / Proxy Server in use and check the configuration.


References

NOTE:235290.1 – Understanding the Enterprise Manager Management Agent 10g ‘emd.properties’ File NOTE:358953.1 – What ports are used in communication between the Grid Control OMS and a Management Agent? NOTE:471842.1 – Understanding Proxy Settings in Enterprise Manager Grid Control

Some useful Metalink Master Documents related to Grid Control[edit]

  • Master Index for Managing Oracle Database and Listener with Grid Control [ID 1304021.1]
  • Master Note for 10g Grid Control Agent Process Control (Start, Stop & Status) & Configuration [ID 1082009.1]
  • How to Run the RDA against a Grid Control Installation [ID 1057051.1]
  • How to Run the RDA against a Grid Control Installation Release 11g [ID 1190193.1]
  • Grid Control Target Maintenance: Steps to Diagnose Issues Related to "Agent Unreachable" Status [ID 271126.1]
  • Master Note for 10g Grid Control Enterprise Manager Communication and Upload issues [ID 1086343.1]
  • Master Note for Target Maintenance Through 10g Enterprise Manager Grid Control [ID 1202453.1]
  • Receiving agent unreachable notification emails very often after 10.2.0.4 agent upgrade [ID 752296.1]
  • Healthcheck Metric failing for a 10.2.0.4 Target Database with 10.2.0.4 Agent [ID 602633.1]