This is not the document you are looking for? Use the search form below to find more!

Report home > Computer / Internet

Connection Storm

0.00 (0 votes)
Document Description
connection storm, denial of service, database hang
File Details
  • Added: November, 18th 2009
  • Reads: 316
  • Downloads: 11
  • File size: 17.62kb
  • Pages: 2
  • Tags: connection, storm, attack
  • content preview
Submitter
  • Username: jwang
  • Name: jwang
  • Documents: 1

We are unable to create an online viewer for this document. Please download the document instead.

Connection Storm screenshot

Add New Comment




Related Documents

User Guide BlackBerry Storm 9500 Smartphone

by: manualzon, 267 pages

user guide of blackberry storm, blackberry 9500 series in pdf format, features cdma and ev-do and gsm hsdpa network, this is an owners manual, instruction manual and operation manual of blackberry ...

Storm Peaks Sheet

by: Brennan, 1 pages

Storm Peaks Sheet Music

Storm Roster Game 3

by: Filipe, 1 pages

Storm Roster Game 3

China Metal Storm Hypervelocity Weapon

by: t0nes, 5 pages

Research paper on China's hypervelocity research into Metal Storm stacked projectile technology

Make a new wifi connection

by: Johnny, 14 pages

How to make a new wifi connection in win7

The Influence Of The Word Connection Type On The Facilitation Effect In The Lexical Decision Task

by: pauwel, 4 pages

The results of numerous studies indicate that word recognition is faster when a target word is preceded by the associatively or semantically connected prime word (cf. Meyer & Schvaneveldt, 1971; ...

Perfect Storm Social Media And Affiliate Marketing

by: christopher, 13 pages

Perfect Storm Social Media And Affiliate Marketing

Operation Affiliate Storm Review - Work From Home

by: jian, 5 pages

Operation Affiliate Storm Review - Work From Home

Thyroid Storm

by: rebeka, 26 pages

THYROTOXIC CRISIS INTRODUCTION clinical manifestation of an extreme hyperthyroid state morbidity disability ...

Upcoming Storm

by: Frank, 10 pages

Storm Related

Content Preview
The "Connection Storm" Story at Stubhub
Stubhub, back in 2004-2007, was running a 4-node RAC 9i database on 32-bit CPU. To
solve the resource constraints, the 4-node RAC was partitioned into two nodes for OLTP,
while the other two nodes were reserved for broker uploads, 24X7 call center
transactions and internal reporting etc. The deployment of application partitioning and
configuration of different shared pool and data cache for OLTP/DSS completely
eliminated the I/O from the OLTP side and enhanced the user experience at the front. To
fix the 32-bit linux limitation on the SGA, external data cache was configured to borrow
extra memory for the SGA on the OLTP side and lessen the data cache while doing the
opposite on the DSS side, namely, cutting down the shared pool to merely 200+ M and
allocating 2+ GB for the data cache .
The "connection storm" occurred at the turn of 2006-2007. In December 2006 around,
database already had problem with the lock-up of sys objects like OBJ$, SEQ$ etc. (The
lockup of OBJ$, SEQ$ was easily solved by identifying the sessions and killing those
locks.) At about the same time, maybe in January 2007, the app team launched a few
more application blades, which led to a jump of the aggregated total connection pools to
the OLTP database nodes at the backend. Compounding the issues would be the release
of some codes, involving nine LOGISTICS_***_prefixed tables, that totally changed the
behavior of the database, and led to the phenomenon of nine tables and their indexes
generating 95-98% of logical reads of the entire database.
Since day one, Stubhub database had exhibited the "UNDO enqueue" as the most serious
lock of all, which derived from the fact that the database was created with the 16k block
size. When the nine LOGISTICS_***_prefixed tables were modified by the business
people at the backend nodes, the OLTP users would have to obtain an UNDO block
across the interconnect for building a consistent image. With 95-98% of logical reads of
the entire database generated by the nine tables and their indexes, the "UNDO enqueue"
problem was exacerbated. A momentary freeze in resolving the "UNDO enqueue" would
have led to the hang of hundreds of user sessions at the OLTP nodes. This hang caused a
false "connection storm", over which people easily got confused about.
Things became difficult at Stubhub when literally hundreds of new people joined the
band wagon at the same time upon the rumor that the company was to be sold. After 1-2
months' debate and stress test, the nine LOGISTICS_***_prefixed tables and their
indexes were moved to a separate 4k tablespace. The magic thing was that no matter it
was originally 16k or now 4k, the logical reads for the nine LOGISTICS_***_prefixed
tables remained about the same, while response time improved by about 30-40% for some
queries involving those tables. Once the deployment was in, the frequency for false
"connection storm" was lessened to about twice per week instead of more than twice per
day. The complication, however, was that at about the time the nine
LOGISTICS_***_prefixed tables were moved to 4k tablespace in March, the SWAT
team, which was created for dealing with the false denial-of-service claim and the outage
but consisted of mostly non-database staff with a purported tenure of half a year [till June

30, 2007], had pushed for the DCD (dead connection detection) to be deployed ahead of
the 4k tablespace move. While the "connection storm" lessened in intensity, the credit for
the mitigation of the "connection storm" did not get ascribed to the 4k change.
From April 2007 onward, the DCD implementation began to take toll on the database.
DCD, for its mechanism to ping the java connection pools, caused idle connections to the
OLTP database nodes to jump to 200-300 and more from previous levels of about 100-
110 conenctions per OLTP node. That is, DCD defeated the connection pool mechanism,
with connections never shrunk once spawned. The extra hundreds of idle connections,
while eating away at the memory, became deadly when the "UNDO enqueue" induced
hang re-occured. Once hundreds of idle connections transformed into active mode in a
matter of 1-2 seconds, the whole RAC went down. Previously, when the hang occurred, it
would take half a minute for the connections to be created and fired at the OLTP DB
nodes, which in another sense mitigated the severity of attacks at the database.
Now why the RAC database failed to handle the surge of connections? It was both at the
database level and at the OS-level. The Oracle support in April pinpointed the need to
lessen the stack_size parameter. Oracle support, while detecting the "UNDO enqueue" in
some of the systemstate dump provided, pointed out that the 9i cluster manager relied on
the OS-level stack size for handling simultaneous connections to the database. Since the
SWAT team took the infrequent "connection storm" as a DCD fait accompli, stack_size
change, as suggested by Oracle, was irnored. On July 1st, 2007, a new set of SWAT
manager and engineers came on aboard. After about three weeks' discourse, with 20-30
people jamming the conference room, a decision was made to reverse the DCD
deployment and implement the stack_size change. Once the stack_size was changed, the
whole RAC database successfuly withstood the "connection storm" that still frequented at
maybe 1-2 times per week. The reason that stack_size mattered so much to the
"connection storm" was that the cluster manager, under the former stack_size setting,
could not handle more than 200+ concurrent connections. With the new setting,
theoretically, the cluster manager could handle 1000-2000 concurrent connections.
Hence, "UNDO enqueue" induced hang re-occured, the application blades, detecting the
momentary freeze of the database, would spawn and throw hundreds of new connections
to the database in a gradual and normal manner, and then shrink the conenction pool
numbers once the freeze was over. (The ultimate fix of the database hang was to upgrade
to 9.2.0.8 which purportedly fix the "UNDO enqueue".)

Download
Connection Storm

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share Connection Storm to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share Connection Storm as:

From:

To:

Share Connection Storm.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share Connection Storm as:

Copy html code above and paste to your web page.

loading