memory - performance drop and strange GC behaviour after one hour in jboss 5.0.1GA -


we upgraded our software jboss 4.0.5ga 5.0.1ga , noticed after 1 hour or (or 90 minutes in cases) performance drops dramatically.

at same moment, garbage collector logs show minor garbage collection times jumping 0.01s ~1.5s, amount of heap being cleared each time reducing ~400mb before ~300mb after. (see gc viewer graph 1)

gc graph

we think these both symptoms of the same underlying root cause.

jvm settings are:

-server -xms2048m -xmx2048m -xx:newsize=384m -xx:maxnewsize=384m  -xx:survivorratio=4 -xx:minheapfreeratio=11 -xx:permsize=80m -verbose:gc -xx:+printgcdatestamps -xx:+printgcdetails -xx:+disableexplicitgc  -djava.awt.headless=true -dusesunhttphandler=true  -dsun.net.client.defaultconnecttimeout=25000  -dsun.net.client.defaultreadtimeout=50000 -dfile.encoding=utf-8  -dvzzv.log.dir=${ercorebatch.log.dir} -xloggc:${ercorebatch.log.dir}/gc.log   -duser.language=it -duser.region=it -duser.country=it -dvfjavawl=er.core.it 

the production environment t5220 or t2000 hardware, 32 bit sparc, running solaris 10 virtual machine. jboss 5.0.1.ga, java 1.6.0_17

we set test environment consisting of 2 identical boxes, running same software 1 using jboss 4.0.5ga , 1 using jboss 5.0.1.ga. vmware vms running on hp proliant dl560 gen8 4 x 2.2ghz intel xeon cpu e5-4620 , 64gb ram. guest vms 4 vcpu, 4096mb ram, centos 6.4.

we found reproduce problem in our environment. box running on 4.0.5 ran fine, on jboss 5.0.1ga saw same strange gc behaviour. performance can't tested in our environment since don't have same amount of load production.

we don't think it's memory leak, since after each major gc, used heap size returns same size:

enter image description here

analysing heap dumps taken pre- , post-apocalypse, discovered number of following objects different:

org.jboss.virtual.plugins.context.file.filesystemcontext

during first hour, there 8 of them, , after apocalypse hits, see between 100 , 800.

other that, heap dumps quite similar, , top objects either java or jboss objects (ie no application classes)

setting -djboss.vfs.forcevfsjar=true on our test environment fixed problem (i.e. strange gc behaviour disappeared) when applied in production, both strange gc pattern , performance problem remained - although gc times did not increase (to 0.3 seconds rather 1.5 seconds).

in our test environment, deployed same software in jboss 5.1.0 , found same behaviour 5.0.1.

so conclusions @ point there happening in jboss 5.x around 60 / 90 minute mark has impact on both garbage collection , performance.

update:

we tried upgrading web services stack jbossws-native-3.3.1, fixed problem in our test environment. however, when deploying next test environment (closer production environment), problem still there (albeit reduced).

update:

we have resolved setting jboss.vfs.cache.timedpolicycaching.lifetime large number equivalent many years.

this feels workaround bug in jboss. default cache lifetime 30 minutes (see org.jboss.util.timedcachepolicy), , saw problems after either 60 or 90 minutes.

the vfs cache implementation combinedvfscache , think it's using timedvfscache underneath.

it seems better fix change cache implementation permanent cache, we've wasted enough time on problem , our workaround have do.

it hard determine root cause of problem looking @ gc graphs. how stacks looks when happens? there hyperactive threads? there nasty threads creating huge pile of objects forcing garbage collector work hell rid of them? think more analysis must performed determine root cause of problem.


Comments

Popular posts from this blog

assembly - 8086 TASM: Illegal Indexing Mode -

Java, LWJGL, OpenGL 1.1, decoding BufferedImage to Bytebuffer and binding to OpenGL across classes -

javascript - addthis share facebook and google+ url -