Penetration Testing

June 4, 2026

What Happens During a Web Application Penetration Test? A Step-by-Step Walkthrough

Ivan Stanev

Founder & Senior Security Researcher

Manual web application penetration testing is a security assessment in which certified testers attempt to exploit vulnerabilities in a web application using the same techniques a real attacker would use, without relying solely on automated scanners. Unlike vulnerability scans, a manual test verifies exploitability, chains multiple weaknesses into attack paths, and assesses business logic flaws that automated tools cannot detect. The result is a confirmed, prioritised list of vulnerabilities with proof-of-concept evidence and remediation guidance.

If you have signed off on a Statement of Work and are now wondering what actually happens between kickoff and final report delivery, this walkthrough covers the full engagement lifecycle. Each phase maps to specific techniques, tooling decisions, and the types of findings each phase is designed to surface.

The Seven Phases of a Manual Web Application Penetration Test

A structured engagement runs through these phases in sequence, though reconnaissance and enumeration often continue in parallel with active testing as new attack surface is discovered.

Scoping and pre-engagement
Reconnaissance and OSINT
Application mapping and enumeration
Authentication and session testing
Business logic and functional testing
Exploitation and chaining
Reporting and remediation validation

Phase 1: Scoping and Pre-Engagement

Before any testing begins, the engagement is defined in writing. This phase establishes the test scope (domains, environments, IP ranges), the rules of engagement (testing windows, rate limits, out-of-scope systems), and the testing methodology (black box, grey box, or white box).

Grey box is the default for most SaaS clients. Testers receive standard user credentials and basic application documentation, which lets them skip the early guessing stages and spend their time on deeper, more meaningful attack paths. Black box is appropriate when the client wants to simulate an external attacker with zero prior knowledge. White box includes source code, architecture diagrams, and admin credentials, and is most appropriate for pre-launch security reviews.

A scoping document that is vague costs you. If the staging environment is not explicitly included, the tester will not test it. If a third-party integration is not listed, it is out of scope. Before signing off, make sure the target list includes every subdomain, API endpoint group, and environment that matters.

See how IVASTA structures its scoping process for web application engagements.

Phase 2: Reconnaissance and OSINT

Reconnaissance maps everything publicly discoverable about the target before the tester sends a single authenticated request. This phase surfaces forgotten assets, exposed credentials, and technology fingerprints that shape the rest of the engagement.

Common reconnaissance activities include:

Subdomain enumeration: passive sources (Certificate Transparency logs, SecurityTrails, Shodan) combined with active brute-forcing using wordlists tuned to the client's industry.
Technology fingerprinting: identifying the application server, framework version, CDN, WAF presence, and JavaScript libraries via response headers, HTML comments, and file paths.
OSINT for credentials: checking breach databases and public GitHub repositories for leaked API keys, credentials, or environment files belonging to the target organisation.
Historical content analysis: Wayback Machine scraping to find endpoints that were removed from navigation but not from the server.

Reconnaissance findings directly influence which attack vectors get prioritised. An application running an outdated version of a framework with a known deserialization vulnerability moves to the top of the queue.

Phase 3: Application Mapping and Enumeration

Mapping builds a complete picture of the application's attack surface before exploitation begins. Testers spider the application, capture all traffic through an intercepting proxy (Burp Suite is standard), and catalogue every endpoint, parameter, file upload handler, and API route.

For API-heavy SaaS applications, this phase includes:

Importing any available OpenAPI or Swagger specifications and comparing documented routes against what the server actually responds to.
Identifying unauthenticated endpoints that should require authentication.
Mapping parameter types across GET and POST requests, including hidden fields, JSON body parameters, and GraphQL query structures.
Identifying role-based access differences by comparing responses from accounts at different privilege levels.

Phase 4: Authentication, Authorisation, and Session Testing

This phase targets the mechanisms that are supposed to control who can do what inside the application. It is where the most critical findings in web application testing tend to surface, particularly in multi-tenant SaaS products.

Authentication testing

Testers evaluate the login mechanism for weak lockout policies, username enumeration via differing error messages or response times, insecure password reset flows, and multi-factor authentication bypass techniques. If OAuth or SAML is in use, the implementation is tested for common misconfigurations, including open redirects in the redirect_uri parameter and state parameter predictability.

Authorisation and access control testing

Broken Object Level Authorisation (BOLA) is the most common critical finding in API-heavy applications. A tester checks whether a low-privilege user can access or modify another user's resources by manipulating object identifiers in API calls.

A typical BOLA test sequence looks like this:

# Authenticated as User A (ID: 1042)
GET /api/v1/invoices/1042 -> 200 OK (expected)

# Substitute User B's invoice ID without re-authenticating
GET /api/v1/invoices/1089 -> 200 OK (BOLA confirmed)

# The server returns User B's invoice data to User A
# No ownership check on the object, only the session

Broken Function Level Authorisation (BFLA) is tested separately: can a standard user call admin-only API functions by directly accessing the endpoint, bypassing the UI that hides those controls?

Phase 5: Business Logic and Functional Testing

Automated scanners fail almost entirely at business logic testing. This phase requires a tester to understand what the application is supposed to do and then probe the gaps between that intent and the actual implementation.

Examples of business logic flaws found in real SaaS engagements include:

Negative quantity manipulation in e-commerce flows that credit the attacker's account rather than debiting it.
Price parameter tampering where the client-side sends the unit price to the server and the server trusts it without verification.
Multi-step workflow bypass where a user can skip step two of a three-step verification process by jumping directly to step three.
Race conditions in account top-up or coupon redemption endpoints that allow the same resource to be consumed twice by sending concurrent requests.

These vulnerabilities do not show up in a scanner report. They require a tester who has read the product documentation, created legitimate accounts, and understands the financial or operational impact of a successful exploitation.

Phase 6: Exploitation and Attack Chaining

Active exploitation confirms that a vulnerability is genuinely exploitable, not just theoretically present. This is the distinction that separates a manual penetration test from a vulnerability scan. Testers attempt to achieve proof-of-concept exploitation for every confirmed vulnerability, stopping at the agreed scope boundary.

Attack chaining is where manual testing provides compounding value. Individual vulnerabilities that appear low severity in isolation can become critical when combined:

Vulnerability A	Vulnerability B	Chained Impact
SSRF on admin import endpoint	Internal metadata service accessible	Cloud provider credentials exfiltration
Stored XSS in user profile field	Admin views all profiles in dashboard	Session token theft, account takeover
IDOR on document ID	Document viewer renders PDFs server-side	Local file inclusion via crafted PDF path
OAuth redirect_uri not validated	User token returned in redirect	Account takeover without credentials

Each chain represents a realistic attacker path. Reporting the vulnerabilities individually would understate the actual risk. Chaining them demonstrates real-world business impact.

Phase 7: Reporting and Remediation Validation

A penetration test is only as useful as its report. The deliverable from an IVASTA engagement includes a full technical report with an executive summary, a risk-rated findings register, and proof-of-concept evidence for every confirmed vulnerability.

How findings are structured

Each finding contains: a CVSS score with the scoring rationale, a description of the vulnerability class, step-by-step reproduction instructions, a screenshot or terminal output confirming exploitation, the business impact of successful exploitation, and specific remediation guidance written for your development team.

Findings are rated on a five-tier scale: Critical, High, Medium, Low, and Informational. A Critical finding means an unauthenticated attacker can achieve account takeover, data exfiltration, or remote code execution with a single exploit. Informational findings are documented for completeness but carry no immediate remediation obligation.

Remediation validation

After developers address findings, a remediation validation (also called a retest) confirms that each fix works as intended and has not introduced a regression. IVASTA includes one round of remediation validation within the engagement scope for all Critical and High severity findings.

Why Manual Testing Finds What Scanners Miss

Automated vulnerability scanners are good at pattern matching against known signatures. They reliably identify unpatched software versions, basic SQL injection in unsanitised GET parameters, and common misconfigurations. They cannot reason about application behaviour.

Testing Capability	Automated Scanner	Manual Penetration Test
Known CVEs and patch gaps	Yes	Yes
SQL injection (simple GET params)	Yes	Yes
Business logic flaws	No	Yes
BOLA / IDOR across user accounts	Rarely	Yes
Multi-step workflow bypass	No	Yes
Chained attack paths	No	Yes
OAuth / SAML implementation flaw	No	Yes
Race condition exploitation	No	Yes
Custom payload crafting	No	Yes

IVASTA's testers hold OSCP certifications and conduct every engagement manually. Automated tooling is used for enumeration and traffic capture, not for the determination of exploitability. A scanner that reports a reflected XSS as a finding has not confirmed the XSS is exploitable through the WAF, or that it can reach a logged-in admin session. A tester has.

Request a Scoping Call

If you are preparing for a SOC 2 Type II audit, a customer security review, or a pre-launch security assessment, a web application penetration test is the most direct way to find and fix vulnerabilities before an attacker does. IVASTA will scope your engagement within 24 hours and deliver a proposal within 48 hours. Request a scoping call at IVASTA Security and a senior tester will reach out directly.

Frequently Asked Questions

Manual web application penetration testing is a security assessment conducted by certified testers who attempt to exploit vulnerabilities in a web application using real-world attacker techniques. Unlike automated vulnerability scanning, manual testing verifies that vulnerabilities are actually exploitable, tests business logic flaws, and chains multiple weaknesses together to demonstrate compound attack paths. The output is a confirmed, evidence-backed findings report with remediation guidance.

A standard web application penetration test for a mid-size SasS product with two or three user roles and a documented API takes between five and ten business days of active testing. The timeline depends on the scope: number of unique functionalities, presence of an API, complexity of the authentication model, and whether source code is provided. The full engagement including reporting and a remediation validation round typically spans three to four weeks from kickoff to final report delivery.

A vulnerability scan runs automated tools that identify known vulnerabilities by matching software signatures and configurations against a database of CVEs and misconfigurations. It does not attempt to exploit anything, does not test business logic, and cannot assess whether two low-severity issues combine into a critical attack path. A penetration test involves a tester who actively attempts to exploit vulnerabilities, chains findings together, and confirms real-world impact. Compliance frameworks including SOC 2 and PCI DSS specify penetration testing, not just scanning, for a reason.

Automated scans and penetration tests serve different purposes. A scan monitors your known attack surface for signature-based issues on a continuous or scheduled basis. A penetration test applies human reasoning to find business logic flaws, access control breakdowns, and chained attack paths that scanners cannot detect. Most enterprise customers and compliance auditors require an annual penetration test from a qualified third party in addition to ongoing scanning. If a scan is the only test you are running, your coverage has significant gaps.

Before the engagement begins, you need a signed Statement of Work with an agreed scope document, test accounts at every privilege level your application supports (standard user, admin, API-only, etc.), a rules of engagement document specifying testing windows and any off-limits systems, and a point of contact who can answer tester questions during the engagement. If you are testing an API, providing an up-to-date OpenAPI specification significantly improves coverage depth. Staging environment access is strongly preferred over production testing.

The most common critical and high severity findings in web application penetration tests are broken access control issues including BOLA and BFLA, insecure direct object references, authentication bypasses, business logic flaws, server-side request forgery (SSRF), injection vulnerabilities including SQL and command injection, and insecure API implementations. Cross-site scripting (XSS) and security misconfiguration findings are frequent at medium severity. The specific mix depends on the application's stack and architecture, which is why scope definition and methodology documentation matter before the first request is sent.