Generic Troubleshooting Steps
As an integration expert, these are the steps I would follow to troubleshoot an integration:
- Gather Information: The first step is to gather as much information as possible about the problem. This includes:
- GLU.Engine configuration and build version details.
- GLU.Engine logs in DEBUG mode, for ISO8583 be sure to gather the Q2 logs too.
- Postman Test Pack with sample messages to enable the issue to be repeated in a test environment.
- Collect information about the transaction that is causing the issue, the message format, the communication channel used, and any error messages that are generated.
- If errant payloads have elements that are encrypted, one should get those sample messaged decrypted to enable parameter values to be understood. Sometimes a failure may be caused by a parameter value that breaks a validation rule.
- Documentation such as API Specs, solution design diagrams, Sequence diagrams etc.
- Time of day may also be relevant, is there any unusual system behaviour happening at the time the issue is observed (e.g. peak load or perhaps system maintenance activities).
- Analyse the Data: The next step is to analyse the data that was gathered in step one. This includes reviewing the message payloads and checking for any inconsistencies or errors. It is important to understand how the message was constructed, what data was sent, and what data was received.
- Check the Network Connection: The third step is to check the network connection between the two systems. This includes reviewing network logs, checking for packet loss, and testing the connection to ensure that it is stable and reliable.
- Verify the Message Format: The fourth step is to verify that the message format is correct. This includes checking the message structure, message type, message class, and other relevant parameters / fields. If you are experiencing protocol-related issues, such as incorrect message formats or sequencing, TCPdump is a command-line packet analyser that can help you identify the source of the problem. By capturing and analysing the network traffic, you can determine if the issue is related to the protocol itself or to the specific implementation of the protocol in your system.
- Check the Configuration: The fifth step is to check the configuration of both systems. This includes reviewing system settings, ensuring that the correct protocols and ports are being used, and verifying that the necessary libraries and drivers are installed. Check that the message format, data elements, and fields are correctly configured.
- Perform functional testing: Conduct functional testing to identify any issues in the message flow. Verify that the message is being sent and received correctly and that the message content is correct.
- Collaborate with Stakeholders: The sixth step is to collaborate with stakeholders, including developers, system administrators, and end-users. This includes sharing information, discussing possible solutions, and determining the best course of action to resolve the issue.
- Test the Solution: The seventh and final step is to test the proposed solution to ensure that the issue has been resolved. This includes verifying that the message is now being processed correctly and that all data is being transmitted and received as expected.
- Document findings: Document your findings and the steps taken to resolve the issue. This documentation can be used as a reference for future troubleshooting efforts.
Diagnostic Tools
Here are some diagnostic tools that you can use to troubleshoot system integration issues:
- Wireshark: Wireshark is a popular network protocol analyser that can capture and analyse network traffic in real-time. You can use it to analyse the message flow and identify any issues with the message format, data elements, and fields.
- Log analysis tools: There are many log analysis tools available, such as Splunk, ELK Stack, and Logstash. These tools can help you analyse the system logs and identify any errors or exceptions that may have occurred during the transaction.
- Performance monitoring tools: Performance monitoring tools such as Nagios, Zabbix, and Prometheus can help you monitor system performance and identify any issues that may be affecting system performance.
- Network monitoring tools: Network monitoring tools such as PRTG, SolarWinds, and Nagios can help you monitor network performance and identify any issues that may be affecting network performance. TCPdump is a command-line packet analyser that is used to capture and analyse network traffic. If you suspect that there is a problem with the network connectivity between systems, TCPdump can help you identify the source of the problem. By capturing and analysing the network traffic between systems, you can determine if packets are being dropped, delayed, or corrupted, and then take the appropriate corrective action.
By using these diagnostic tools, you can effectively troubleshoot system integration issues and resolve them in a timely and efficient manner.
Timeout Troubleshooting
When troubleshooting integration issues where timeout errors are suspected, the following steps and tools can be used to assist the troubleshooting process:
- Identify the Symptoms: Identify the symptoms of the timeout error. This can include slow response times, failed requests, or error messages related to timeouts. It is important to understand the nature of the issue and the specific symptoms that are being experienced.
- Check Network Connectivity: Check the network connectivity between the systems. This can include checking for packet loss, latency, or other network issues. Tools such as ping, traceroute, or network diagnostic tools can be used to identify and diagnose network connectivity issues.
- Review System Logs: Review system logs to identify any errors or warning messages related to the integration. This can include logs from both the sending and receiving systems as well as the GLU.Engine logs. It is important to identify any errors or warning messages that may be related to the timeout issue.
- Check Timeout Settings: This can include both the sending and receiving systems, as well as any middleware or other components that may be involved in the integration. It is important to ensure that timeout settings are configured appropriately and are not set too low.
- Assess System Performance: Performance assessing an integration is not a trivial exercise as it can include setting up a dedicated performance testing environment to monitor CPU usage, memory usage, and disk I/O when to simulated high volumes of traffic are being processed. Load testing tools can be used to simulate various load scenarios and identify any performance or scalability issues that may be causing the timeout errors.Performance monitoring tools can be used to identify any bottlenecks or performance issues that may be causing the timeout errors. There is more detail on how to performance test your GLU.Engines here and here.
- Work with Vendors: The final step is to work with vendors or service providers to identify and resolve any issues related to the integration. This may involve working with the vendor’s support team or development team to identify and resolve any issues that may be causing the timeout errors.
ISO8583 Troubleshooting
Here are some diagnostic tools that you can use to troubleshoot ISO8583 system integration issues:
ISO8583 message viewer: An ISO8583 message viewer is a tool that can help you view and decode ISO8583 messages. It can help you understand the message content and identify any errors in the message flow. Here are some ISO8583 message viewer tools:
- ISO8583 Message Decoder by Sarel: This is a free web-based tool that allows you to decode and view ISO8583 messages in various formats such as HEX, ASCII, and BINARY. It supports a wide range of ISO8583 message versions and can display detailed information about each field.
- MessageWare ISO8583 Analyser: This is a paid software tool that provides a user-friendly interface for analysing ISO8583 messages. It can decode and display the message content, perform field validation, and highlight any errors or issues in the message.
- ISO8583 Message Editor by Sarel: This is a free web-based tool that allows you to create, edit, and send ISO8583 messages. It supports a wide range of ISO8583 message versions and can generate messages in various formats such as HEX, ASCII, and BINARY.
- ISO8583 Toolkit by Proxymitron: This is a paid software tool that provides a comprehensive set of ISO8583 tools for message creation, editing, and analysis. It includes a message editor, message analyser, message simulator, and message validation tool.
- ISO8583 Message Viewer by Pantor: This is a free and open-source software tool that allows you to view and analyze ISO8583 messages. It supports a wide range of ISO8583 message versions and can display detailed information about each field.
- TCPdump can be used to capture and verify the integrity of the binary messages being sent and received as ISO8583 messages. By analysing the binary data in the captured packets, you can ensure that the messages are being transmitted correctly and are not being altered in transit.
ISO8583 simulator: An ISO8583 simulator can help you simulate transactions and test the message flow. It can help you identify any issues with message routing, message format, and message content.
There are several ISO8583 simulator tools available, each with their own unique features and capabilities. Here are some popular ISO8583 simulator tools that you can consider:
- Postman – Postman is a popular API development tool that can also be used as an ISO8583 simulator. It allows users to create and send ISO8583 messages using its API testing capabilities.
- JPOS – JPOS is a Java-based framework that provides ISO8583 message processing capabilities. It also includes a simulator tool that allows users to create, send, and receive ISO8583 messages.
- ISO8583 Simulator – ISO8583 Simulator is a standalone simulator tool that can be used to create, send, and receive ISO8583 messages. It supports various message types and can be configured to simulate different network scenarios.
- M365 ISO8583 Simulator – M365 ISO8583 Simulator is a web-based simulator tool that can be used to test ISO8583 message processing. It supports various message types and can be customised to simulate different network scenarios.
- Paragon Testing Tool – Paragon Testing Tool is a comprehensive testing tool that includes ISO8583 message processing capabilities. It allows users to create, send, and receive ISO8583 messages and also includes advanced testing features such as load testing and stress testing.
These are just a few of the many ISO8583 simulator tools available. It’s important to evaluate your specific requirements and choose a simulator tool that best fits your needs.
ISO8583 Packager Issues
If a sending system uses a different packager than the receiving system, there may be issues with the encoding and decoding of the message. The receiving system may not be able to properly interpret the message because the fields may be in a different order, have different lengths, or be missing altogether.
This can result in errors or rejected transactions, which could cause delays or even financial losses. In some cases, the receiving system may be able to detect the issue and reject the message outright, while in other cases, the message may be processed incorrectly, leading to further issues down the line.
To prevent such issues, it is important to ensure that both the sending and receiving systems use compatible packagers that adhere to the same version of the ISO 8583 standard. This can be achieved by using standardised libraries or tools that are commonly used in the financial industry, or by ensuring that both systems are properly configured to communicate with each other.
Unmatched or corrupt ISO8583 packagers used by sending or receiving systems or by your GLU.Engine, this can result in various issues, including:
- Message Rejection: The receiver system may reject the ISO8583 messages sent by the sender system because the message format is not correct. This can lead to communication failures between the systems, causing delays and errors in the transaction processing.
- Data Corruption: If the packager is not configured correctly, it may incorrectly process or interpret the message data, leading to data corruption. This can cause errors in the transaction processing or result in incorrect transaction data being recorded.
- Misinterpretation of Message Fields: If the packager is not configured correctly, it may misinterpret the message fields, leading to incorrect data being recorded or processed.
SOAP Troubleshooting
Prior to launching the GLU.Engines, ensure that the SOAP connectors are operational. The startup process for the GLU.Engine will pause at each SOAP connector until a connection has been established. In the event that a connection is unavailable, the startup process will be terminated.
Non-sensical log error:
If you see this error in the logs:
ERROR in c.q.l.core.rolling.SizeAndTimeBasedRollingPolicy@2094777811 – totalSizeCap of [4 GB] is smaller than maxFileSize [9 GB].
or something similar, this means you have incorrect settings in the Application Settings, see Logging level tab – Max file Size, Max History or Total Size Cap.
Performance Troubleshooting
Extracting thread dumps
You will need to install the developers version for the JDK to get jstack & jcmd, use this command on linux environments with yum, or something similar.
sudo yum install java-1.8.0-openjdk-devel
jstack is an effective command line tool to capture thread dumps.
The Jstack tool is included in JDK since Java 5. If you are running in older version of java, consider using other options. The jstack tool is shipped in: JDK_HOME\bin folder.
Here is the command that you need to issue to capture a thread dump:
jstack -l <pid> > <file-path>
Where …
pid : is the Process Id of the application, whose thread dump should be captured.
file-path : is the file path where thread dump will be written in to.
For example:
jstack -l 37320 > /opt/tmp/threadDump.txt
As per the example the thread dump process would generat a file here:
/opt/tmp/threadDump.txt
Capturing Heap Dumps
jmap is a tool to print statistics about memory in a running JVM. We can use it for local or remote processes. To capture a heap dump using jmap we need to use the dump option:
jmap -dump:[live],format=b,file=<file-path> <pid>
Along with that option, we should specify several parameters:
live : if set it only prints objects which have active references and discards the ones that are ready to be garbage collected. This parameter is optional.
format=b : specifies that the dump file will be in binary format. If not set the result is the same.
file : the file where the dump will be written to.
pid : id of the Java process.
An example would be like this:
jmap -dump:live,format=b,file=/tmp/dump.hprof 12587
Remember that you can easily get the pid of a Java process by using the jps command, or
ps -afe | grep java
Keep in mind that jmap was introduced in the JDK as an experimental tool and it’s unsupported. Therefore, in some cases, it may be preferable to use other tools instead.