V&V not for Vendetta
Over the past six years, I have worked on developing and acceptance testing of the applications for conducting and supporting clinical trials. Applications of various sizes and complexity, big data, a huge number of visualizations and views, data warehousing, ETL, etc. The products are used by doctors, clinical trials management and people who are involved in the control and monitoring of research.
For the applications that have or can have a direct impact on the life and health of patients, a formal acceptance testing process is required. Acceptance test results along with the rest of the documentation package are submitted for audit to the FDA (Food and Drug Administration, USA). The FDA authorizes the use of the application as a tool for monitoring and conducting clinical trials. In total, my team has developed, tested and sent to the production more than thirty applications. In this article, I will briefly talk about acceptance testing and improvement of tools used for it.
Note: I do not pretend to be the ultimate truth and completely understand that most of what I write about is a Captain Obvious monologue. But I hope that the described can be useful to both the entry level and the teams that encounter this in everyday work, or at least it may make happy those who have simpler processes.
In other words, in order for headache pill to get to a pharmacy counter somewhere on Brighton Beach, it goes through 3 phases of human trials, during which it is determined how much active substance should be contained in the tablet, how safe it is and whether it heals headache at all.
What is the FDA in terms of what we do and how does it affect the development process and release cycle?
In fact, the FDA practically does not concern the development process itself, we work according to usual SCRUM (to be honest, it is not quite SCRUM—they say it’s now fashionable to call such process ‘a modified SCRUM’) with a non-sprint release cycle. We do not deliver to the production at the end of each and every sprint (and even at the end of three sprints, and even ten if the timeline involves 15 sprints), that is, from the point of view of delivery to the end user, we have a waterfall-like methodology.
In our case, the testing is divided into two independent parts with different timelines, different estimates and different processes. The first part is the usual in-dev testing, where the tester is an integral part of the development team and finishes the sprints along with the development. The second part is the actual acceptance testing and maintenance of the related activities (when required). The process is built according to the V&V methodology: user and functional requirements at the input, test-scripts and a package of supporting documentation at the output.
The release cycle depends on the project scope, releases generally are not iterative. After the release, the application can remain unchanged for a long time, a break between releases may vary from a couple of months to a year or more.
What is V&V, and how does this affect the acceptance process.
In other words:
Verification: Are we doing the product right?
Validation: Are we making the right product?
It means that we must test the functional and user specifications with the necessary and sufficient completeness. For us, the first V turns into technical acceptance testing (SIT), the second into support of UAT, where:
- SIT—System Integration Testing
- UAT—User Acceptance Testing
Visualization of requirements coverage is carried out in Traceability Matrix (a regular table in Excel or Word, I will dwell on it later), which allows tracing from the requirements to the test and vice versa. In the case of using electronic document management, the matrix is built automatically, the tests are linked to the requirements that are stored together with the tests (together = HP ALM, of course not mixed up). In case the requirement is not covered and would not be covered, we justify why we don't cover it.
When the requirement coverage is not required?
For example, CFR Part 11 ( FDA Rules for Electronic Records and Electronic Signatures) contains a lot of requirements which have already been covered in Microsoft, so if we use Windows AD for authentication, we don’t need to cover those requirements again.
Ultimately, the essence of acceptance testing comes down to testing the product for compliance with the requirements and the requirements for compliance with the product.
A fairly large number of roles take part in the process, which in one form or another are familiar to everyone involved in software development: Developer (Junior, Middle, Senior, Lead), Unit Tester, SIT Tester, Technical Product Owner, Business Product Owner, Scrum Master, Project Manager, Business Analyst, Technical Lead, SIT Test Lead, UAT Test Lead, Global QC, Support, Deployment Engineer and others.
The role specific for us—Global QC. This is the person on the customer side who is responsible for observing and fulfilling the requirements for the process—Software Lifecycle and all sorts of Standard Operational Procedures (SOP) on the customer side—during development and acceptance, and further provides a package of documents for external audit.
In the scope of product release, we create a documentation package which incorporates a large number of nesting levels, including documentation that relates to how the product was tested, why it was tested this way and not otherwise, what specifically was in the scope of testing and what was not:
Validation Plan and Team Roster—PM is responsible for the document creation and approval. The document usually includes the system description, list of artefacts of development and testing, validation strategy, list of roles and responsibilities.
Test Strategy—the document which does not belong to the specific application but exists for the branch of products or a branch of testing. For example, how do we determine which part of testing would be automated, what do we plan to use for automation, how do we plan to conduct manual testing, do we plan to use check-lists, test scripts, both of them or anything else; how do we plan to decide whether to perform performance/load/volume testing or not; and things like this.
Test Plan—common for UAT and SIT teams, includes a brief description of the test object, possible restrictions, environment requirements, timing, test suites, and so on.
Test Suite—a set of tests or checklists formed by functional area, type of testing or any other characteristics.
Traceability Matrix—a matrix with traces from requirements to tests. Tracing of requirements as an evidence of coverage is an important part of the process. Using the identifier of any requirement, you can find a specific step in which this requirement is tested and provide evidence to the reviewer (screenshot, file, etc.) for a specific version of the application (or for each version in which this requirement was formally covered). Therefore, link, link and once again link the tests to the requirements (tasks), on the basis of which you get the expected result, even if this is not required of you, because it would simplify your life in future. If it is impossible due to the use of different non-integrating tools, you can always leave comments in tasks / tests, or make a matrix (Excel mentioned above), or create a primitive database of three tables.
PDS—Product Design Specification, Tech Lead or System Architect is responsible for the document creation. In fact, it is a kind of combination of high and low level architecture documents (HLA and LLA).
FRS & URS or BRS—functional and user requirements. Usually, there are two separate documents but sometimes there is Business Requirements Specification which incorporates both specs.
Defect Log—a log of bugs identified in the application and/or requirements during formal SIT.
Minor Issues Log—a log of minor changes in test scripts (typos, left or redundant requirements, any mistakes).
Test Summary Report—a report about test phase results, which includes the following:
- Information about builds used for the testing (including build numbers and deployment dates with information about reasons for deployment), number of test cycles and test scripts results (pass/failed).
- A description of discrepancies of SOPs.
- The list of open defects with justifications.
- The link to the defect log or defect log itself.
- The link to the minor issues log or log itself.
- A recommendation about deployment to the production environment.
Deployment Plan—the document which is used for deployment to production, incorporates description of roll-backs.
Validation Summary Report—the closing document for the Validation Plan.
Any documentation approval process may be divided into 3 stages:
- Document preparation.
Let's look at the example of a Test Suite.
To write test scripts and combine them into test suites, we use a standard template approved on the customer side.
Test suite paragraphs:
- Name of the project and the application.
- Release version.
- Name and unique identifier of the suite.
- Description (what do we test and which types of testing do we use).
- Test scripts.
- List of approvers.
In turn, each test consists of:
- Name and unique identifier of the test script.
- Traceability References.
- Instruction for execution (e.g. instruction of how to mask sensitive data).
- Steps (procedure, expected and observed results).
- Test script result.
The normal life cycle of the test resembles a waterfall and looks like this:
Requirements analysis. Definition and application of test design techniques for the most adequate coverage of functionality. Writing steps.
- Dry runs on dev environment
At this stage, we try to find how the application meets the requirements, and find most possible errors, including requirements errors.
- Responsible person review (SIT Team Lead)
Stylistic and logical review.
- Dry runs on SIT environment
At this stage, errors associated with the installation, environment and configuration of the environment are caught (by default, we assume that the SIT environment exactly or almost completely repeats PROD). Successful completion of this stage means that the version that is deployed is stable and the release can be considered a candidate.
- Customer review (Global QC)
Global QC verifies that the requirements are met and that written tests correspond to the SOPs of the company.
- Approval (Global QC, Technical PO, Business PO).
- Formal test-script execution on SIT environment (release candidate version)
After the approval of the tests for the execution (p. 6) and the successful completion of dry runs on the SIT environment (p. 4), the test is formally executed on the SIT environment with the formal fixation of the result. If the bugs found in the dry-run passes are not formal and simply entered in Jira similarly to what happens during the development process, then a separate defect form is created for each bug found in the formal execution, which is included in the output documentation package for the product.
- Test script execution review (SIT Team Lead).
The same to point 3.
- Customer review (Global QC)
Global QC checks the correctness and completeness of filling in the results, possible errors, the presence of evidence (for example, screenshots). An important point, because it is Global QC that is responsible for providing a package of documents to an external audit (by the FDA or customers).
Working with Personal Data
Given that research is conducted using the double-blind* methodology, this is the lesser of our problems. But the data of doctors, company names, research protocol numbers, etc., must be changed. From a formal point of view, if we cannot get rid of sensitive data, we have to mask them on the screenshots, but this is a fairly standard and understandable situation.
*double-blind—patients are randomly divided into two groups, one of which receives the study drug, and the second a placebo or a drug with proven effectiveness. At the same time, neither the doctor nor the patient knows which group the patient was assigned to. This eliminates the bias of the doctor and the placebo effect. In the context of working with personal data, this means that in most cases the patient’s identity cannot be identified on the basis of data stored in the database or accessible to the user.
But the fact that this is the least of our problems does not mean that it cannot bring trouble. Here are a couple of rakes (not to step on twice) that we got on our projects:
May be fun (not for us at that moment): ‘The Globe’
In one of the applications, for creating a wow effect and not only, we really needed to make an interactive globe that you can rotate with the mouse, switch day / night and so on. On the globe, there are marks on the addresses, which are colored depending on the values and ranges in which these values fall (a kind of color coding). After anonymizing the database on the demo environment, two hours before the demo, thanks to the anonymizing script, we were left without zip codes with all the consequences.
Moral: do not touch the data two hours before the demo.
Story number two: ‘About anonymization’
Background: Data is collected in the repository from a large number of sources, data belong to different domains, but are interconnected by identifiers.
The story: data was uploaded to the database and anonymized before being used for test purposes. It turned out that the data was not downloaded from all the necessary sources. Then they loaded the missing part. It was not possible to connect the second part of the data (not yet anonymized) with the already anonymized first part. As a result, the start of work on the SIT environment was postponed for two weeks, for which the deployment and support teams corrected the data.
Moral: before anonymizing, you should make sure that the database contains everything that should be in it, and think in advance about the applicability of the anonymization and obfuscation mechanisms.
Electronic vs paper workflow in practice
Electronic workflow somewhat simplifies communication and the review and passage process from the point of view of manual work, but practically does not give any wins in terms of the time for preparation for testing and its execution. Listed below are the pros and cons of electronic workflow versus paper based on the example of HP ALM.
- Easy to support.
- Less manual work.
- All team members at any time have access to the current state of a particular test
- History of changes.
- History of executions.
- You can track the time taken to complete the test.
- Easy to plan future runs.
- It is hard to use wrong version of the test-script.
- Electronic signature.
Cons (specifically for HP ALM):
- Big amount of time for scripts formatting.
- Periodic problems with the tool itself.
- Not the best spell checker.
- Inconvenient interface.
- The time spent writing and reviewing the tests is practically the same as the tests in Word.
A true story related to paperwork and manual signatures: ‘A Nightmare Before Release’
One evening, I wrote 450 times: ‘the color of the dots on the graph corresponds to the one stated in the requirements. Surname, name, date’ and put a signature—simply because we printed on a black / white printer, and the color of the dots on the graphs mattered.
Moral: the choice of means should be consistent with the goals.
Another story: ‘Heavy is good, heavy is reliable.’©
Paper workflow is about paper (who would doubt it). Acceptance testing phase for one far from the largest application may be about five kilograms of paper.
The conclusion that suggests itself from the stories above (and many untold ones): despite the listed and not listed drawbacks of electronic workflow—if you can choose, then definitely choose electronic, even if HP ALM (not HP anymore).
A large number of visualizations does not allow stably automating applications, therefore, as part of the initial approach, we limited ourselves to unit tests (including tests on the database side) and API tests, without any attempts to move towards E2E.
How and why did we come to at least partial automation?
The first task was to explain to ourselves that in some cases we would really gain time. Yes, it is understandable: not everyone believes in automation and often its use does not justify itself—because it is used in the wrong way, not there, and not for that, but this is a topic for a separate discussion, of which there are a little less than ‘automation must have!!!’, but still a lot.
The second thing is to explain to the customer that he will gain time and that it will be sufficiently reliable and acceptable from a formal point of view. The industry is controlled and there are reasons for this.
Third, and main: to determine the algorithm by which we can consciously make a decision about the automation of testing a particular application or part of the application, and obtain consent for automation. This is important because it is clear that the automation process should be no less controlled than the described process for the manual test scripts.
After we explained and justified the first two points to ourselves and the customer, we wrote a testing strategy, asked business analysts to add additional fields to the requirements, and formed a set of questions, depending on the answers to which we can form a set for automation.
The list of questions in our case:
- Is it possible to automate the testing in this particular case?
- Is it a stable* component?
- How often do we need to execute test scripts for this component?
- Is it a business-critical function?**
- How hard is it to automate the testing?
* Stable = the component has not changed for some time and component changes are not planned for the next releases.
** It is filled depending on the value of the field added to the requirements by the business analyst.
In general, the decision-making process is as follows:
- At the input, we have requirements from FRS.
We create Design Matrix, where each row is FRS requirement, and columns are:
- Requirement description
- Questions 1-5
- Team Decision
- Approximate estimate
- The team puts down answers to questions and on the basis of the received data determines whether it is worth automating testing of a specific requirement / group of requirements in full or in part.
- The customer reviews the proposed solution and approves the final version.
- After approval of the Design Matrix, autotests are written. Scripts for autotests are written in Gherkin notation and undergo a review similar to that for manual tests (Global QC, Technical Owner, Business Owner). Step definitions, page objects, and so on, are reviewed on the pull requests, including by a technical specialist on the part of the customer. Autotest results and generated reports are reviewed and approved on the Global QC side.
Within two years from the moment of implementation, we switched to partial automation of acceptance testing of two applications related to downloading, configuration and displaying of data in the data warehouse, and in the near future we plan to continue to use the combined approach on other products developed for the customer, if possible.
In conclusion, I would like to note that the formal acceptance testing is not something scary or useless, as it seems to many of people in the industry. It allows, taking full advantage of the scenario testing approach, to facilitate testing of future versions, confirm the necessary and sufficient level of software quality, and ultimately reassure the customer. And what, if not the peace of mind of the customer, is important in outsource development?