Parsing Oracle TNS Files in Python
What? And Why?
This probably seems like a really niche first post, and it is. I mean... it really is super niche. But during my one workday officially "between jobs," I decided to work on this problem because it was something that would have been really useful over the past few years.
Lots of companies use Oracle databases, especially those with legacy enterprise IT infrastructure. Some companies have thousands of database instances, potentially across many data centers, geographical regions, and even hosted on-premise or in the cloud. Why so many database instances? A lot of companies use the Development, [System Integration Testing,] User Acceptance Testing, Production deployment strategy for quality assurance for each database. This means that a big company that already might have thousands of databases used for specific organizations (e.g., different areas of research, HR, sales, manufacturing, etc.) and apps (e.g., specific inventory management software) can multiply that number by 3-4x for the different QA deployments. This can be a lot to keep track of!
If you are a data scientist at a company using Oracle and have one (or many) databases that are managed by a large IT organization and centrally defined somewhere, you may find this post useful. I know I found myself in the situation where I needed to connect to a database in Python, I knew the name of the database, but I didn't know all the connection information. Rather than write this parser earlier, I lazily would go to the files where the connection information is defined, open it in my favorite text editor, search for the database name, and manually enter the connection information into my script. Yuck. That's definitely not the way we do things these days.
The TNS File
Many companies will organize all these connections into one or many tnsnames.ora files. My previous company had a half-dozen of these files for different parts of the organization, each with hundreds to potentially thousands of entries. These are the files that I'd manually search through. Sometimes I'd have to look in 2-3 either because I didn't know where the database was defined or just because of a brain-fart.
These files are long lists of net service names mapped to connect descriptors, which can essentially be thought of as instructions used to connect to Oracle database instances. Here's a really simple example of an Oracle TNS file that's not far off from what you might find in the wild:
MYDB =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = server01)(PORT = 25881))
(ADDRESS = (PROTOCOL = TCP)(HOST = server01)(PORT = 25882))
)
(CONNECT_DATA =
(SID = MYDB01)
)
)
OTHERDB.DOMAIN, ALIAS_FOR_OTHERDB.DOMAIN =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)
(HOST = server02)
(PORT = 25881)
))
(CONNECT_DATA = (SID = MYDB02))
)
)A couple things:
- Spaces usually don't matter. Any whitespace around parentheses, equal signs, and other tokens are usually ignored. The one place they do matter is when listing keyword-value pairs on multiple lines, which requires a space to differentiate from a new parameter. I find this quirk (mentioned here) a little weird because it ignores the current level of parenthesis nesting.
- The spec is extensive. Although it may not seen to be that big at first, there are many optional keywords, some keywords can appear in different places for various reasons (e.g., ADDRESS in DESCRIPTION or ADDRESS_LIST), Oracle's parser seems very forgiving of keyword-values appearing in odd places, and Oracle's parser seems very robust to syntax formatting issues (e.g., unclosed parentheses).
Oh boy! Something like this could really get out of hand. So, let's scope the goal of what we want to do to something tangible: given a net service name, provide the CONNECT_DATA information as well as all the ADDRESS definitions within the connect descriptor.
This should be enough information to connect to a database and start querying!
Parsing Approach
To accomplish our task, we are going to do the following:
- Write a grammar using pyparsing that parses the tnsnames.ora file. I've tried this on some gnarly formatted files, and it works pretty well (although improvements are absolutely welcome!).
- Process the parsed results to extract the ADDRESS and CONNECT_DATA information we need to create an SQLAlchemy engine.
- Query some data!
Writing the Grammar
The pyparsing package allows one to easily define Parsing Expression Grammars (PEGs), which feel a lot like writing "old school" context free grammars back in grad school. It feels pretty natural to express your grammar through code. In fact, it felt very EBNF-like.
Let's take a look at the entire grammar:
import pyparsing as pp
# 1. Literals
VAR = pp.Word(pp.alphas + "_", pp.alphanums + "_").setName('variable')
SPACE = pp.Suppress(pp.Optional(pp.White()))
EQUALS = SPACE + pp.Suppress('=') + SPACE
OPEN = pp.Suppress('(') + SPACE
CLOSE = pp.Suppress(')') + SPACE
INTEGER = pp.Optional('-') + pp.Word(pp.nums) + ~pp.Char(".")
INTEGER.setParseAction(lambda t: int(t[0]))
FLOAT = pp.Optional('-') + pp.Word(pp.nums) + pp.Char('.') + pp.Optional(pp.Word(pp.nums))
FLOAT.setParseAction(lambda t: float(t[0]))
STRING = pp.Word(pp.alphanums + r'_.-')
# 2. Literal assignment expressions: (IDENTIFIER = VALUE)
INTEGER_ASSIGNMENT = pp.Group(OPEN + VAR + EQUALS + INTEGER + CLOSE)
FLOAT_ASSIGNMENT = pp.Group(OPEN + VAR + EQUALS + FLOAT + CLOSE)
STRING_ASSIGNMENT = pp.Group(OPEN + VAR + EQUALS + STRING + CLOSE)
# 3. Nested object assignment
ASSIGNMENT = pp.Forward()
NESTED_ASSIGNMENT = pp.Group(OPEN + VAR + EQUALS + ASSIGNMENT + CLOSE)
ASSIGNMENT << pp.OneOrMore(INTEGER_ASSIGNMENT |
FLOAT_ASSIGNMENT |
STRING_ASSIGNMENT |
NESTED_ASSIGNMENT)
# 4. Net service name(s): NAME(.DOMAIN)[, NAME(.DOMAIN)...]
NET_SERVICE_NAME = pp.OneOrMore(pp.Word(pp.alphas + '_' + '.', pp.alphanums + '_' + '.')
+ pp.Optional(pp.Suppress(',')))
# 5. Full TNS entry
TNS_ENTRY = NET_SERVICE_NAME + EQUALS + ASSIGNMENT
Here we have created a relatively simple grammar building up to the tns_entry definition, which you'll see is what we need to parse the file.
Let's go through the numbered sections in the code:
- Literals are our fixed-value symbols that we'll use to build up more complex expressions. Some of these symbols have elements of their definition that are optional or suppressed. There's an important distinction between the two: optional means the element does not have to appear in the string in order to match that symbol, whereas suppressed means that the matching element in the string will not be included in the parsed result. For example, the open parenthesis is defined as OPEN = pp.Suppress('(') + SPACE because we don't actually need "(" to appear in the parsed result. Another thing to note is the assignment of parse actions to these simple literals, like casting a integer or float to the actual Python type.
- These simple assignment expressions are a construction of the literals defined in the first section. They provide definitions for integers, floats and strings. Note that STRING is defined last to allow FLOAT and INTEGER to take precedence, which arises due to order-dependence imposed by PEGs.
- The ASSIGNMENT symbol is used recursively in its own definition, so it is initially declared using pp.Forward()- think of it like a C/C++ forward declaration, which allows you to use it before it is actually defined. We used the forwardly declared ASSIGNMENT in NESTED_ASSIGNMENT, which parses the complex nesting seen in our example tnsnames.ora file above. We then actually define ASSIGNMENT using the << operator as one of any simple assignment (from section 2) or as a NESTED_ASSIGNMENT that we just defined. Because ASSIGNMENT is used recursively within its own definition, it will parse to arbitrary levels of nesting.
- As mentioned before, each entry in the tnsnames.ora file starts with a net service name. The net service name is basically an alias (or multiple comma-delimited aliases) for the connection descriptor that describes how to get to the database. Here we provide a way to parse out the alias(es).
- We put it all together to define TNS_ENTRY, which parses the net service name that starts the entry and the arbitrarily nested set keyword-value attributes of the TNS entry.
Testing the Parser
We can parse our two mock definitions above using this parser:
TNS_NAMES_ORA = """
MYDB =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = server01)(PORT = 25881))
)
(CONNECT_DATA =
(SID = MYDB01)
)
)
OTHERDB.DOMAIN, ALIAS_FOR_OTHERDB.DOMAIN =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)
(HOST = server02)
(PORT = 25881)
))
(CONNECT_DATA = (SID = MYDB02))
)
)
"""
# Parse the "file"
for tokens, _start, _end in TNS_ENTRY.scanString(TNS_NAMES_ORA):
print(tokens.asList())
This results in the following output:
[['MYDB'],
['DESCRIPTION',
['ADDRESS_LIST',
['ADDRESS', ['PROTOCOL', 'TCP'], ['HOST', 'server01'], ['PORT', 25881]]],
['CONNECT_DATA', ['SID', 'MYDB01']]]]
[['OTHERDB.DOMAIN',
'ALIAS_FOR_OTHERDB.DOMAIN'],
['DESCRIPTION_LIST',
['DESCRIPTION',
['ADDRESS_LIST',
['ADDRESS', ['PROTOCOL', 'TCP'], ['HOST', 'server02'], ['PORT', 25881]]],
['CONNECT_DATA', ['SID', 'MYDB02']]]]]
Looks like we've made a lot of progress - this already looks like something that can be processed pretty easily!
Processing the Parser Output
Ok! Let's finish this up. Our goal is to parse out all the information needed to create a SQLAlchemy engine. Let's take a look at the minimum bits needed (or at least that I needed!):
- Driver: This is always going to be oracle+cx_oracle, because under the hood I'm actually asking SQLAlchemy as an abstraction over the cx_Oracle package, which is actually talking to the database.
- Host: Name of the server hosting the database instance, which appears in ADDRESS.
- Port: Port on the host used to connect to the database instance, which appears in ADDRESS.
- SID: The Oracle System ID, which is the unique name of the database and appears in CONNECT_DATA.
We are going to write a quick generator that parses the mock tnsnames.ora file that we created and emits a dictionary with the above connection information for every ADDRESS and CONNECT_DATA combination encountered for a given net service name.
Note: This is very simple and may not be right in every situation, such as if you have a more complex entry in your TNS file. It worked for me, but is mainly illustrative for your particular situation.
Another Note: If you were one of my CS professors, you might want to brace yourself. Two of the functions below can most definitely share a common function to avoid code duplication, but I didn't want to spend more time writing this example.
def _parse_addresses(tns_entry, addresses):
"""
Parse ADDRESS keywords from the a TNS entry
:param definition: Unparsed part of the TNS entry
:param addresses: List of addresses parsed
"""
keyword = tns_entry[0]
# Base Case: We found an ADDRESS, so extract the data
# and do not recurse into it
if keyword.upper() == 'ADDRESS':
port = None
host = None
for k, v in tns_entry[1:]:
if k == 'PORT':
port = v
elif k == 'HOST':
host = v
if port is None:
print('WARNING: Ignoring ADDRESS due to missing PORT')
elif host is None:
print('WARNING: Ignoring ADDRESS due to missing HOST')
addresses.append({'host': host, 'port': port})
# Else recursively descend through the definition
for d in tns_entry[1:]:
# Only parse sub-lists, not literals
if isinstance(d, list):
_parse_addresses(d, addresses)
def _parse_connect_data(tns_entry, sids):
"""
Parse CONNECT_DATA keywords from the a TNS entry
:param definition: Unparsed part of the TNS entry
:param sids: List of Oracle SIDs
"""
keyword = tns_entry[0]
# Base Case: We found a CONNECT_DATA, so extract the data
# and do not recurse into it
if keyword.upper() == 'CONNECT_DATA':
sid = None
for k, v in tns_entry[1:]:
if k == 'SID':
sid = v
if sid is None:
print('WARNING: Ignoring CONNECT_DATA due to missing SID')
sids.append(sid)
for d in tns_entry[1:]:
# Only parse sub-lists, not literals
if isinstance(d, list):
_parse_connect_data(d, sids)
def get_connection_info(net_service_name: str, tns_string: str):
"""
Generator over all simple connections inferred from a TNS entry
:param net_service_name: Net service name to return connection info for
:param tns_string: tnsnames.ora file contents
"""
# Parse the TNS entries and keep the requested definition
definition = None
for tokens, _start, _end in TNS_ENTRY.scanString(tns_string):
if net_service_name in tokens.asList()[0]:
definition = tokens.asList()[1]
break
# Check if we found a definition
if definition is None:
raise KeyError(f'No net service named {net_service_name}')
# Look for all the ADDRESS keywords
addresses = []
_parse_addresses(definition, addresses)
# Look for all CONNECT_DATA keywords
sids = []
_parse_connect_data(definition, sids)
# Emit all combinations
for address in addresses:
for sid in sids:
yield {'sid': sid, **address}
# Try it out!
for connection_info in get_connection_info('MYDB', TNS_NAMES_ORA):
print(connection_info)This outputs:
{'sid': 'MYDB01', 'host': 'server01', 'port': 25881}
{'sid': 'MYDB01', 'host': 'server01', 'port': 25882}At this point, we could use it directly in SQLAlchemy to create an engine in something like this (WARNING not tested!):
from sqlalchemy import create_engine
from sqlalchemy.engine.url import URL
from sqlalchemy import exc
def get_sqlalchemy_engine(user: str, password: str, net_service_name: str, tns_string: str):
"""
Return the first valid engine that can be created from a TNS entry
:param net_service_name: Net service name to return connection info for
:param tns_string: tnsnames.ora file contents
"""
for connection_info in get_connection_info('MYDB', TNS_NAMES_ORA):
try:
engine = create_engine(
URL('oracle+cx_oracle',
username=user,
password=password,
host=connection_info['host'],
port=connection_info['port'],
database=connection_info['sid']))
# Test the engine
with engine.connect():
pass
# Return valid engine
return engine
except exc.SQLAlchemyError:
# Try the next configuration
continue
# We don't want to get here!
raise Exception(f'Could not create a valid engine for {net_service_name}')
Final Thoughts
That was quite a bit of work, but we managed to parse a tnsnames.ora file and connect to an Oracle database instance defined within it. This was really basic and more illustrative than anything, but I hope someone out there finds it helpful. I certainly would have been happier for the code years ago! If you have any suggestions or corrections, please let me know!
Comments
Post a Comment